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GENSET.037PR PRO«SION/VL 

TITLE OF THE INVENTION : Nucleic acids encoding human TBC-1 protein and 
polymorphic markers thereof. 



5 FIELD OF THE INVENTION 

The invention concerns genomic and cDNA sequences of the human TBC-1 gene, a biallelic 
marker of the TBC-1 gene and the association established between this marker and prostate 
cancer. The invention provides means to determine the predisposition of individuals to prostate 
cancer as well as means for the diagnosis of this cancer and for the prognosis/detection of an 
10 eventual treatment response to therapeutic agents acting against prostate cancer. 

11 BACKGROUND OF THE INVENTION 

'■S. Prostate cancer 

% The incidence of prostate cancer has dramatically increased over the last decades. It averages 

S 30-50/100,000 males in Western European countries as well as within the US White male 

m 15 population. In these countries, it has recently become the most commonly yiagnosed 

malignancy, being one of every four cancers diagnosed in American males. Prostate cancer's 
9 incidence is "very much population specific, since it varies from 2/100.000 in China, to over 

ao/1 00,000 among African-American males. 

I 20 In France, the incidence of prostate cancer is 35/100.000 males and it is increasing by 
10/100,000 per decade. Mortality due to prostate cancer is also growing accordingly. It is the 
second cause of cancer death among French males, and the first one among French males 
aged over 70. This makes prostate cancer a serious burden in terms of public health. 

25 Prostate cancer is a latent disease. Many men carry prostate cancer cells without overt signs of 
disease. Autopsies of individuals dying of other causes show prostate cancer cells in 30 % of 
men at age 50 and in 60 % of men at age 80. Furthermore, prostate cancer can take up to 10 
years to kill a patient after the initial diagnosis. 

30 The progression of the disease usually goes from a well-defined mass within the prostate to a 
breakdown and invasion of the lateral margins of the prostate, followed by metastasis to 
regional lymph nodes, and metastasis to the bone marrow. Cancer metastasis to bone is 
common and often associated with uncontrollable pain. 
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Unfortunately, in 80 % of cases, diagnosis of prostate cancer is established when the disease 
has already metastasized to the bones. Of special interest is the observation that prostate 
cancers frequently grow more rapidly in sites of metastasis than within the prostate itself. 

Early-stage diagnosis of prostate cancer mainly relies today on Prostate Specific Antigen (PSA) 
dosage, and allows the detection of prostate cancer seven years before clinical symptoms 
become apparent. The effectiveness of PSA dosage diagnosis is however limited, due to its 
inability to discriminate between malignant and non-malignant affections of the organ and 
because not all prostate cancers give rise to an elevated serum PSA concentration. 
Furthermore. PSA dosage and other currently available approaches such as physical 
examination, tissue biopsy and bone scans are of limited value in predicting disease 
progression. 

Therefore, there is a strong need for a reliable diagnostic procedure which would enable a more 
systematic early-stage prostate cancer prognosis. 

Although an early-stage prostate cancer prognosis is important, the possibility of measuring the 
period of time during which treatment can be deferred is also interesting as currently available 
medicaments are expensive and generate important adverse effects. However, the 
aggressiveness of prostate tumors varies widely. Some tumors are relatively aggressive, 
doubling every six months whereas others are slow^rowing. doubling once every five years. In 
fact the majority of prostate cancers grows relatively slowly and never becomes clin.cally 
manifest Very often, affected patients are among the elderly and die from another disease 
25 before prostate cancer actually develops. Thus, a significant question in treating prostate 
carcinoma is how to discriminate between tumors that will progress and those that w.ll not 
progress during the expected lifetime of the patient. 

Hence there is also a strong need for detection means which may be used to evaluate the 
30 aggressiveness or the development potential of prostate cancer tumors once diagnosed. 

Furthermore at the present time, there is no means to predict prostate cancer susceptibility. It 
would also be very beneficial to detect individual susceptibility to prostate cancer. This could 
allow preventive treatment and a careful follow up of the development of the tumor. 

35 
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A further consequence of the slow gro\Arth rate of prostate cancer is that few cancer cells are 
actively dividing at any one time, rendering prostate cancer generally resistant to radiation and 
chemotherapy. Surgery is the mainstay of treatment but it is largely ineffective and removes the 
ejaculatory ducts, resulting in impotence. Oral oestrogens and luteinizing releasing hormone 
analogs are also used for treatment of prostate cancer. These hormonal treatments provide 
marked improvement for many patients, but they only provide temporary relief. Indeed, most of 
these cancers soon relapse with the development of hormone-resistant tumor cells and the 
oestrogen treatment can lead to serious cardiovascular complications. Consequently, there is a 
strong need for preventive and curative treatment of prostate cancer. 
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Efficacy/tolerance prognosis could be precious in prostate cancer therapy. Indeed, homnonal therapy, 
the main treatment currently available, presents important side effects. The use of chemotherapy is 
limited because of the small number of patients with chemosensitive tumors. Furthemnore the age 

S profile of the prostate cancer patient and intolerance to chemotherapy make the systematic use of 

2:- 1 5 this treatment very difficult. 

r Therefore, a valuable assessment of the eventual efficacy of a medicament to be administered to a 

it prostate cancer patent as w^ell as the patent's eventual tolerance to it may pemiit to enhance the 

benefit/risk ratio of prostate cancer treatment. 

£ It is known today that there is a familial risk of prostate cancer. Clinical studies in the 1950s had 

already demonstrated a familial aggregation in prostate cancer. Control-case clinical studies 

:3 have been conducted more recently to attempt to evaluate the incidence of the genetic risk 

factors in the disease. Thus Steinberg et al., 1990, and McWhorter et al., 1992 confirm that the 
25 risk of prostate cancer is increased in subjects ,,having one or more relatives already affected by 
the disease and when forms of early diagnosis in the relatives exist. 

it is now well established that cancer is a disease caused by the deregulation of the expression 
of certain genes. In fact, the development of a tumor necessitates an important succession of 
30 steps. Each of these steps comprises the deregulation of an important gene intervening in the 
normal metabolism of the cell and the emergence of an abnormal cellular sub-clone which 
overwhelms the other cell types because of a proliferative advantage. The genetic origin of this 
concept has found confinnation in the isolation and the characterization of genes which could be 
responsible. These genes, commonly called "cancer genes", have an important role in the 
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normal metabolism of the cell and are capable of intervening in carcinogenesis following a 
change. 

Recent studies have identified three groups of genes which are frequently mutated in cancer. 

5 The first group of genes, called oncogenes, are genes whose products activate cell proliferation. 
The normal non-mutant versions are called protooncogenes. The mutated fonns are excessively 
or inappropriately active in promoting cell proliferation, and act in the cell in a dominant way in 
that a single mutant allele is enough to affect the cell phenotype. Activated oncogenes are rarely 
transmitted as gennline mutations since they may probably be lethal when expressed in all the 

10 ceils. Therefore oncogenes can only be investigated in tumor tissues. 

The second group of genes which are frequently mutated in cancer, called tumor suppressor 
genes, are genes whose products inhibit cell grov»rth. Mutant versions in cancer cells have lost 
their normal function, and act in the cell in a recessive way in that both copies of the gene must 

1 5 be inactivated in order to change the cell phenotype. Most importantly, the tumor phenotype can 
be rescued by the wild type allele, as shown by cell fusion experiments first described by Harris 
and colleagues (1969). Gennline mutations of tumor suppressor genes may be transmitted and 
thus studied in both constitutional and tumor DNA from familial or sporadic cases. The current 
family of tumor suppressors includes DNA-binding transcription factors (i.e., p53, WT1), 

20 transcription regulators (i.e.. RB, APC. probably BRCA1). protein kinase inhibitors (i.e., p16), 
among others (for review, see Haber D & Harlow E, 1997). 

The third group of genes which are frequently mutated in cancer, called mutator genes, are 
responsible for maintaining genome integrity and/or low mutation rates. Loss of function of both 

25 alleles increases cell mutation rates, and as a consequence, proto-oncogenes and tumor 
suppressor genes may be mutated. Mutator genes can also be classified as tumor suppressor 
genes, except for the fact that tumorigenesis caused by this class of genes cannot be 
suppressed simply by restoration of a wild-type allele, as described above. Genes whose 
inactivation may lead to a mutator phenotype include mismatch repair genes (i.e.. MLH1. 

30 MSH2). DNA helicases (i.e., BLM. WRN) or other genes involved in DNA repair and genomic 
stability (i.e., p53. possibly BRCA1 and BRCA2) (For review see Haber D & Harlow E, 1997; 
Fishel R & Wilson T. 1997; Ellis NA,1997). 

There is growing evidence that a critical event in the progression of a tumor cell.from a non- 
35 metastatic to metastatic phenotype is the loss of function of metastasis-suppressor genes. 
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These genes specifically suppress the ability of a cell to metastasize. Work from several groups 
has demonstrated that human chromosomes 8. 10, 11 and 17 encode prostate cancer 
metastasis suppressor activities. However, other human chromosomes such as chromosomes 
1, 7, 13, 16, and 18 may also be associated to prostate cancer. 

5 

It thus remains to localize and to identify the genes specifically involved in the development and 
the progression of prostate cancers starting from the genetic analysis of the hereditary and the 
non-hereditary forms and to define their clinical implications in terms of prognosis and 
therapeutic innovations. 

10 SUMMARY OF THE INVENTION 

The present invention concerns a human TBC-I genomic sequence which is included in a 
previously unknown candidate region of prostate cancer located on chromosome 4. The 
invention also concerns a biallelic marker belonging to the TBC-I gene which is strongly 
associated to prostate cancer. 

15 

Human TBC-I genomic sequences, corresponding cDNA and TBC-I coding or regulatory 
sequences 

The human gene encoding a human TBC-1 polypeptide has been found by the inventors to be 
located on human chromosome 4. The Inventors have isolated and sequenced a portion of the 
20 genomic sequence of the human TBC-1 gene. Thus, a first object of the present invention 
relates to a TBC-1 genomic sequence comprising the first three exons and the 5' regulatory 
region. 

The expressions 'TBC-I gene" or 'TBC-1 protein" are herein intended to designate the human 
25 TBC-1 gene or the human TBC-1 protein. 

The present invention concerns a purified or isolated nucleic acid encoding a human TBC-1 
protein, wherein said TBC-1 protein comprises an amino acid sequence of SEQ ID No 5. a 
nucleotide sequence complementary thereto, a fragment or a variant thereof. 
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As used herein, the term "purified" does not require absolute purity; rather, it is intended as a 
relative definition. Purification of starting material or natural material to at least one order of 
magnitude, preferably two or three orders, and more preferably four or five orders of magnitude 
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is expressly contemplated. As an example, purification from 0.1 % concentration to 10 % 
concentration is two orders of magnitude. 

As used herein, the term "isolated" requires that the material be removed from its original 
environment (e.g.. the natural environment if it is naturally occurring). For example, a naturally- 
occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same 
polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in 
the natural system, is isolated. Such polynucleotide could be part of a vector and/or such 
polynucleotide or polypeptide could be part of a composition, and still be isolated in that the 
vector or composition is not part of its natural environment. 

The present invention concerns a purified or isolated nucleic acid comprising a nucleotide 
sequence of SEQ ID No 1. a nucleotide sequence complementary thereto, a fragment or a 
variant thereof. 

The invention also concerns a purified or isolated nucleic acid comprising at least 20 
consecutive nucleotides of the nucleotide sequence SEQ ID No 1, or a nucleotide "sequence 
complementary thereto. 

The invention further deals with a purified or isolated nucleic acid comprising the nucleotide 
sequence of a regulatory region which is located upstream of the first exon of the TBC-1 gene 
and which is contained in the TBCI genomic sequence of SEQ ID No 1. The invention thus 
encompasses a purified or isolated nucleic acid comprising the nucleotide sequence of SEQ ID 
No 2, or a sequence complementary thereto or biologically active fragment or variant thereof, as 
well as any sequence of 8 to about 2000 consecutive nucleotides, preferably of 10 to 500 
consecutive nucleotides, more preferably 10 to 300, included therein. More particularly, the 
invention further includes specific elements within this regulatory region, these elements 
preferably including the promoter region. 

The present invention is also directed to a polynucleotide comprising a functional portion of a 
regulatory region contained in the contemplated TBC-1 genomic sequence and to its use in a 
recombinant expression vector carrying a polynucleotide encoding a polypeptide or a nucleic 
acid of interest. 



The transcription of the genomic sequence leads to more than one mRNA final product, due to 
alternative splicing events. Two mRNAs have been identified and they result from two distinct 
first exons, namely Exon 1 and Exon ^bis. Therefore, the present invention concerns a purified 
or isolated nucleic acid comprising a nucleotide sequence selected from the group consisting of 
5 the nucleotide sequences of SEQ ID Nos 3 and 4. a nucleotide sequence complementary 
thereto or a variant thereof. 

The invention also concerns a purified or isolated nucleic acid having at least 95% nucleotide 
identity with a polynucleotide selected from the group consisting of the nucleotide sequences of 
10 SEQ ID Nos 3 and 4, or a nucleotide sequence complementary thereto. 

^ ' The invention also deals with a purified or isolated nucleic acid comprising at least 20 

consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide 
sequences of SEQ ID Nos 3 and 4. 
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A further object of the invention consists in polynucleotide fragments of the TBC-1 genomic 
sequence that are useful for detecting the presence of an unaltered or altered copy of this gene 
within the human genome of a given individual and also for the detection and/or quantification of 
the expression of the TBC-I gene in said individual host organism. 
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When used herein, an altered copy of the TBC-I gene according to the invention is intended to 
designate the TBC-1 gene that has undergone at least one substitution or deletion of one or 
several nucleotides, wherein said nucleotide substitution, addition or deletion of one or several 
nucleotides causes a change in the amino acid sequence of SEQ ID No 5 or alternatively 
25 causes an increase or a decrease in the expression of the TBC-I gene. 

When used herein, the term "TSC-t gene" is intended to define an entity which can comprise 
some or all the following elements : exons, introns, promoter, regulatory regions, 5'UTR. 3' UTR 
and regions never transcribed and located either upstream or downstream of the coding 
30 sequence of TBC-1, 

Another aspect of the present invention is a purified and/or isolated TBC-1 genomic sequence 
comprising at least one of the biallelic polymorphisms according to the present invention, a 
sequence complementary thereto, a fragment or a variant thereof. In a prefen-ed embodiment, 
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the TBC'1 gene comprises one of the nucleotide sequences of SEQ ID Nos 7 and 8, a 
sequence complementary thereto, a fragment or a variant thereof 

Biallelic markers 

5 The invention also concerns a purified and/or isolated biallelic marker located in the sequence 
of the TBC-1 gene, preferably a biallelic marker comprising an allele associated with prostate 
cancer, with an early onset of prostate cancer, with a response to a prophylactic or therapeutic 
agent administered for cancer treatment, particularly prostate cancer, with the level of 
aggressiveness of prostate cancer tumors, with a modified or forthcoming expression of the 

10 TBC'1 gene, with a modified or forthcoming production of the TBC-1 protein, or with the 
production of a modified TBC-1 protein. 

As used herein, the term "aggressiveness" of prostate cancer tumors refers to the metastatic 
potential of these tumors. 

15 

The invention also relates to a nucleotide sequence, preferably a purified and/or isolated 
polynucleotide comprising a sequence defining a biallelic marker located in the sequence of the 
TBC'1 gene, particularly the genomic sequence of SEQ ID No 1. a fragment or variant thereof 
or a sequence complementary thereto. As used herein, the tenninology "defining a biallelic 

20 marker" means that a sequence includes a polymorphic base from a biallelic marker The 
sequences defining a biallelic marker may be of any length consistent with their intended use, 
provided Ihjat they contain a polymorphic base from a biallelic marker. The sequence has 
between 1 and 500 nucleotides in length, preferably between 5, 10, 15, 20, 25 or 40 and 200 
nucleotides and more preferably between 30 and 50 nucleotides in length. Preferably, the 

25 sequences defining a biallelic marker include the polymorphic base of one of SEQ ID Nos 7 and 
8. In some embodiments the sequences defining a biallelic marker comprise one of the 
sequences selected from the group consisting of SEQ ID Nos 7 and 8. Likewise, the term 
"marker^ or "biallelic marker^ requires that the sequence is- of sufficient length to practically 
(although not necessarily unambiguously) identify the polymorphic allele, which usually implies 

30 a length of at least 4, 5, 6, 10, 15. 20. 25 or 40 nucleotides. 

The invention further concerns a nucleic acid encoding a TBC-1 protein, wherein said nucleic 
acid comprises a nucleotide sequence selected from the group consisting of SEQ ID Nos 7 and 

8. 

35 
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The invention also relates to nucleotide sequence selected from the group consisting of SEQ ID 
Nos 7 and 8 or a fragment or a variant thereof. 

The invention also pertains to a nucleotide sequence selected from the group consisting of a 
5 variant or fragment of SEQ ID Nos 7 and 8, said fragment comprising at least 8 consecutive 
nucleotides of a sequence selected from the group consisting of SEQ ID Nos 7 and 8 and 
including the polymorphic base thereof. 

Identification and characterization of further biallelic markers 

10 Another aspect of the present invention is a method for identifying biallelic markers in the 
genomic region harboring the TBC-1 gene. The method comprises the steps of : 

- obtaining a nucleic acid containing the genomic region harboring the TB01 gene, preferably a 
nucleic acid comprising the nucleotide sequence of SEQ ID No 1 or a nucleic acid comprising a 

5 portion of a nucleotide sequence selected from the group consisting of SEQ ID Nos 3 and 4 that 

i 1 5 is present in the TBC-1 genomic sequence of SEQ ID No 1 . or fragments or variants thereof; 
I - sequencing at least a portion of said nucleic acid containing the genomic region hartDoring the 

in TBC-1 genomic sequence to determine a plurality of pnmer sequences capable of amplifying 

■ portions of said genomic region harboring the TBC-1 gene; 

- amplifying portions of the genomic region harboring the TBC-1 genomic sequence from a 
'i 20 plurality of individuals using said primers to obtain a plurality of amplicons; and 

D - sequencing said plurality of amplicons to identify biallelic markers in the genomic region 

harboring the TBC-1 genomic sequence. 

The invention also concerns a method for the identification and characterization of a biallelic 
25 marker in linkage disequilibrium with a biallelic martcer f the TBC-1 gene, preferably a biallelic 
marker of the TBC-1 gene of which one allele is associated with a trait, preferably with prostate 
cancer. In one embodiment, the biallelic marker of the TBC-1 gene is in the genomic region 
harboring the TBC-1 gene, but outside of the TBC-1 gene itself. In another embodiment, the 
biallelic martcer in linkage disequilibrium with a biallelic marker of the TBC-1 gene is itself located 
30 within the TBC-1 gene. 

The method comprises the following steps: 

- providing a first biallelic marker of the TBC-1 gene; 

- amplifying a position of the genomic region harboring the TBC-1 gene; 

35 - sequencing and identifying second biallelic markers in said amplified portion; 
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- conducting a linkage disequilibrium analysis between said first biallelic marker and second 
biailelic markers; and 

- identifying second biallelic markers in linkage disequilibrium with said first marker. 

In a preferred embodiment, the step of sequencing and identifying second biallelic markers 
5 comprises sequencing second biallelic markers within the TBC-I gene. 

Once identified, the sequences in linkage disequilibrium with a biallelic marker of the TBC-1 
gene may be used in any of the methods described herein, including methods for detemilning 
an association between a biallelic marker and a trait, methods for identifying individuals having 
10 a predisposition for a trait, methods of administration of prophylactic or therapeutic agents 
disease treatment, methods of identifying individuals likely to respond positively or negatively to 
said agents, and methods of using drugs and vaccines. 

Oligonucleotide probes and primers 

15 The invention relates also to oligonucleotide molecules useful as probes or primers, wherein 
said oligonucleotide molecules hybridize specifically with a nucleotide sequence comprised in 
the TBC-1 gene, particularly the TBC-1 genomic sequence of SEQ ID No 1. More particularly, 
the present invention also concerns oligonucleotides for the detection of alleles of biallelic 
markers of the TBC-1 gene, preferably those associated with prostate cancer, with an early 

20 onset of prostate cancer, with a susceptibility to prostate cancer, with the level of 
aggressiveness of prostate cancer tumors, with a modified or forthcoming expression of the 
TBC-1 gene, with a modified production of the TBC-1 protein, or with the production of a 
modified TBC-1 protein. These oligonucleotides are characterized in that they can hybridize with 
a TBC-1 gene, preferably with a polymorphic TBC-1 gene and more preferably with a region of 

25 a TBC-1 gene comprising a polymorphic site of a specific allele associated with prostate cancer, 
with the level of aggressiveness of prostate cancer tumors or with modifications in the regulation 
of the expression of the TBC-1 gene. These oligonucleotides are useful either as primers for 
use in various processes such as DNA amplification and microsequencing or as probes for DNA 
recognition in hybridization analyses. 

In a first embodiment, a purified or isolated nucleic acid useful as an amplification primer or as a 
probe of the invention comprises a polynucleotide selected from the group consisting of the 
nucleotide sequences complementary to any sequence of a strand of the TBC-1 gene and the 
nucleotide sequences complementary to any one of the SEQ ID Nos 2-4. 



30 
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A nucleic acid probe or primer according to the invention comprises at least 8 consecutive 
nucleotides of a polynucleotide of SEQ ID No 1, preferably from 8 to 200 consecutive 
nucleotides, more particularly from 10, 15, 20 or 30 to 100 consecutive nucleotides, more 
preferably from 10 to 50 nucleotides, and most preferably from 40 to 50 consecutive nucleotides 
of a polynucleotide of SEQ ID No 1. 

Preferred probes or primers of the invention comprise at least 20 consecutive nucleotides of a 
polynucleotide selected from the group consisting of : (a) the nucleic acid beginning at the 
nucleotide in position 1999 and ending at the nucleotide in position 13248 of SEQ ID No 1; (b) 
the nucleic acid of SEQ ID No 3; (c) the nucleic acid of SEQ ID No 4. 

Preferred probes or primers of the invention comprise at least 8 consecutive nucleotides of a 
polynucleotide consisting of the nucleotide sequence beginning at the nucleotide in position 1 
and ending at the nucleotide in position 2000 of the nucleotide sequence of SEQ ID No 1. 

In some embodiments, the oligonucleotides comprise the polymorphic base of a sequence 
selected from SEQ ID Nos 7-8. In other embodiments, the oligonucleotides, preferably those 
selected from SEQ ID Nos 11-12. have a 3' terminus immediately adjacent to a polymorphic 
base in the TBC-1 gene, such as a polymorphic base in one of SEQ ID Nos 7-8. In other 
embodiments, the oligonucleotide is capable of discriminating between different alleles of a 
bialielic mariner in the TBC-1 gene, including the biallelic markers of SEQ ID Nos 7-8. For 
example, the oligonucleotide may be capable of specifically hybridizing to one allele of a biallelic 
marker, including one of the biallelic mariners of SEQ ID Nos 7-8. 

\ In a first preferred embodiment, the probe or primer is suspended in a suitable buffer in view of 
perfonning a hybridization or an amplification reaction. 

In a second embodiment, the oligonucleotide probe, which may be immobilized on a support, is 
capable of hybridizing with a TBC-1 gene, particularly with the genomic sequence of SEQ ID No 
1 , preferably with a region of the TBC-1 gene which comprises a biallelic mariner of the present 
invention. The techniques for immobilizing a nucleotide primer or probe on a solid support are 
well-known to the skilled artisan and include, but are not limited to. the immobilization 
techniques described in the present application. 



m a third embodiment, the primer is complementary to any nucleotide sequence of the TBC-1 
gene, particularly of the TBC-1 genomic sequence of SEQ ID No 1, and can be used to amplify 
a region of the TBC-1 gene contained in the nucleic acid sample to be tested which includes a 
polymorphic base of at least one biallelic marker according to the present invention. Preferably, 
the amplified region includes a polymorphic base of the biallelic marker defining by SEQ ID Nos 
7-8. 

In a fourth embodiment, the oligonucleotides of the invention can hybridize with at least a portion of 
an intron or of the regulatory sequences of the TBC-1 gene. Particularly preferred oligonucleotides 
of the invention hybridize with a sequence comprised in an intron or in the regulatory sequences of 
the TBC-1 gene. In an other preferred embodiment, the oligonucleotides of the invention can 
hybridize with at least a portion of an exon selected in the group of exons 1 , 1 b/s and 2. The length 
of oligonucleotides hybridizing, either partially or completely, with such regions of the TBC-1 gene 
is between 1 and 100. preferably between 1 and 50, more preferably between 5 and 30 and even 
more preferably between 5 and 15 nucleotides. 

The terms "sample" or "material sample' are used herein to designate a solid or a liquid 
material suspected to contain a polynucleotide or a polypeptide of the invention. A solid material 
may be, for example, a tissue slice or biopsy within which is searched the presence of a 
polynucleotide encoding a TBC-1 protein, either a DNA or RNA molecule or within which is 
searched the presence of a native or a mutated TBC-1 protein, or alternatively the presence of a 
desired protein of interest the expression of which has been placed under the control of a TBC-1 
regulatory polynucleotide. A liquid material may be, for example, any body fluid such as serum, 
urine etc., or a liquid solution resulting from the extraction of nucleic acid or protein material of 
intekst from a cell suspension or from cells in a tissue slice or biopsy. The temi "biological 
sample" is also used and is more precisely defined within the Section d'ealing with DNA 
extraction. 

The nucleic acid probes and primers of the invention are also used to detect and/or amplify a 
portion of the TBC-1 gene within which a mutation causes a change either in the expression 
level of the TBC-1 gene or a change in the amino acid sequence of the TBC-1 gene translation 
product. 



Amplification of the TBC-I gene 

The invention also concerns a method for the amplification of a region of the TB01 gene, 
particularly the TBC-I genomic sequence of SEQ ID No 1, or a fragment or a variant thereof in 
a test sample. The method comprises the steps of ; 

- contacting a test sample suspected of containing the desired TB01 sequence or portion 
thereof with amplification reaction reagents comprising a pair of amplification primers such as 
those described above, the primers being located on either side of the TBC-I nucleotide region 
to be amplified. The method may further comprise the step of detecting the amplification 
product. For example, the amplification product may be detected using a detection probe that 
can hybridize with an internal region of the amplicon sequences. Alternatively, the amplification 
product may be detected with any of the primers used for the amplification reaction themselves, 
optionally under a labeled form. 

Identification of an association between a set of biallelic markers and a trait 

It is another object of the present invention to provide a method for the identification and 
characterization of an association between an allele of one or more biallelic markers of a TBC-7 
gene and a trait. The method comprises the steps of : 

- genotyping a marker or a group of biallelic markers according to the invention in trait positive 
and trait negative individuals; and 

- establishing a statistically significant association between one allele of at least one mari<er and 
the trait. 

Preferably, the trait positive and trait negative individuals are selected from non-overiapping 
phenotypes as regards to the trait under study. In one embodiment, the biallelic mari<er is 
defined by the sequences of SEQ ID Nos 7-8. 

In a preferred embodiment, the trait is prostate cancer, an eariy onset of prostate cancer, a 
susceptibility to prostate cancer, the level of aggressiveness of prostate cancer tumors, a- 
modified expression of the TBC-I gene, a modified production of the TBC-I protein, or the 
production of a modified TBC-1 protein. 

If the trait is a beneficial response or inversely a side effect to treatment of prostate cancer, the 
method of the invention referred to above further comprises some or ail of the following steps : 
- selecting a population or cohort of subjects diagnosed as suffering from prostate cancer, 
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- administering a specified treatment of prostate cancer to said cohort of subjects; 

- monitoring the outcome of drug administration and identifying those individuals that are trait 
positive or trait negative relative to the treatment; 

- taking from said cohort biological samples containing DNA and testing this DNA for the 
5 presence of a specific allele or of a set of alleles of biallelic markers of the TBC-1 gene; 

- analyzing the distribution of alleles of biallelic markers between trait positive and trait negative 
individuals; and 

- performing a statistical analysis to determine a statistically significant association between the 
presence or absence of the specific allele or of a specific set of alleles of biallelic markers of the 

1 0 TBC-1 gene and the treatment related trait. 

The step of testing for and detecting the presence of DNA comprising specific alleles of a 
biallelic marker or a group of biallelic markers of the present invention can be carried out as 
described further below. 

15 

Identification of a trait causing mutation in the TBC-1 gene 

A further embodiment of the invention is a method to identify a trait causing mutation in the 
TBC-1 gene, particularly the genomic sequence of SEQ ID No 1. pursuant to the detection of an 
association between alleles of one or several of the biallelic markers of the present invention 
20 and a particular trait. The particular trait can in particular be prostate cancer, the level of 
aggressiveness of prostate cancer tumors, a modified expression of the TBC-1 gene, a modified 
production of the TBC-1 protein, or the production of a modified TBC-1 protein. 

A trait causing mutation is a mutation which is at least partly responsible for a particular 
25 detectable phenotype in an individual. The mutation may comprise point mutations, deletions, or 
insertions in the TBC-1 gene, particularly in the genomic sequence of SEQ ID No 1. The 
mutations may lie within the coding sequence for TBC-1 protein or within intronic and/or 
regulatory regions in TBC-1 gene, particularly in the genomic sequence of SEQ ID No 1. 
including splice sites, 5' UTRs, 3' UTRs and promoter sequences, including one or more 
30 transcription factor binding sites. 

This method comprises the following steps : 

- amplifying a region of the TBC-1 gene, particularly of the TBC-1 genomic sequence of SEQ !D 
No 1. from DNA samples of trait positive and trait negative individuals; 
35 - sequencing the amplified region; 
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- comparing DNA sequences from trait positive and trait negative individuals; and 

- detennining mutations specific to trait positive patients. 

In some embodiments, the amplified region is a region located close to a biallelic marker of 
5 TBC-1 gene. In a further embodiment, the amplified region is located close to the biallelic 
marker defined by the sequences SEQ ID 7-8. 

The invention also concerns a mutated TBC-1 gene, particularly a mutated TBC-1 genomic 
sequence of SEQ ID No 1, comprising a trait causing mutation, and particularly the mutated 
10 genes obtained by the process described above. 

A mutated TBC-1 gene can be defined as a gene encoding either a modified or native TBC-1 
protein through a nucleotide sequence which is different from the nucleotide sequence of the 
TBC-1 gene found in a majority of trait negative individuals. 



15 



Detection of markers or groups of markers associated with a trait 

The invention also concerns a method for the detection in an individual of alleles associated 
with a trait preferably selected from prostate cancer, an early onset of prostate cancer, with a 
susceptibility to prostate cancer, the level of aggressiveness of prostate cancer tumors, or with 
20 the expression of the T8C-1 gene. 

This method comprises the following steps : 

- obtaining a nucleic acid sample from the individual to be tested, and 

- determining the presence in the sample of an allele of a biallelic marker or of a group of 
25 biallelic markers of the TBC-1 gene which, when taken a|one or in combination with 

another/other biallelic marker/s of the TBC-1 gene, is indicative of prostate cancer, of an early 
onset of prostate cancer, of the level of aggressiveness of prostate cancer tumors, of a modified 
expression of the TBC-1 gene, of a modified production of the TBC-1 protein, or of the 
production of a modified TBC-1 protein. 
30 In a preferred embodiment, the biallelic marker comprises the sequences of SEQ ID Nos 7-8. 

More particularly, the detection method of the present invention comprises the following steps: 

- obtaining a nucleic acid sample from the individual to be tested, 

- amplifying a nucleotide sequence of the TBC-1 gene, particularly of te TBC-1 genomic 
35 sequence of SEQ ID No 1 , contained in the sample, and 
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- detecting the presence in the sample of an allele of a biallelic marker or of a group of biailelic 
markers of the TBC-1 gene which, when taken alone or in combination with another/other 
biallelic markers of the TBC-1 gene, is indicative of prostate cancer of an early onset of prostate 
cancer, of the level of aggressiveness of prostate cancer tumors, of a modified expression of the 
TBC-1 gene, of a modified production of the TBC-1 protein, or of the production of a modified 
TBC-1 protein. 

In a preferred embodiment, the biallelic marker comprises the sequences of SEQ ID Nos 7-8. 

In preferred embodiments of the two detection methods described above, the presence of 
alleles of one or more biallelic markers of the TSC-) gene is detemiined through 
microsequencing reactions using microsequencing primers such as those of SEQ ID 11-12. 
More particularly, it is preferred that the microsequencing primers be bound to a solid support, 
preferably in the form of arrays of primers attached to appropriate substrates, for example chips, 
or be used in microfluidic devices. Such arrays are described in further detail in the next section. 

Alternatively, the detection method of the present invention can comprise the following steps: 

- obtaining a nucleic acid sample from the individual to be tested. 

- specifically amplifying a nucleotide sequence of the TBC-1 gene comprising an allele of 
biallelic marker or of a group of biallelic markers of a TBC-1 gene which, when taken alone or in 
combination with another/other biallelic marker/s of a T3C-1 gene, is indicative of prostate 
cancer, of an early onset of prostate cancer, of the level of aggressiveness of prostate cancer 
tumors, of a modified expression of the TBC-1 gene, of a modified production of the TBC-1 
protein, or of the production of a modified TBC-1 protein, and 

- detecting the presence of an amplification product. 

In a prefen-ed embodiment, the biallelic marker comprises the sequences of SEQ ID Nos 7-8. 

In a further embodiment of the present invention, another detection method comprises the 
following steps: 

- contacting a hybridization probe with a biological sample under conditions which allow 
hybridization of the probe to the nucleotide sequence of the TBC-1 gene comprising an allele of 
a biallelic marker which is indicative of prostate cancer, of an early onset of prostate cancer, of a 
susceptibility to prostate cancer, of the level of aggressiveness of prostate cancer tumors, of a 
modified expression of the TBC-1 gene, of a modified production of the TBC-1 protein, or of the 
production of a modified TBC-1 protein, and 
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- detecting the formation of a hybrid comprising the probe and a nucleotide sequence from the 
biological sample. 

in a preferred embodiment, the biallelic marker comprises the sequences of SEQ ID Nos 7-8. 

Preferably, the fomiation of a hybrid is detected through the detection of a signal from a label 
attached to the probe. More preferably, the signal is amplified prior to being revealed. 

In yet another embodiment of the present invention, a further detection method comprises the 
following steps: 

- contacting a hybridization probe immobilized on a solid support with a biological sample under 
conditions which allow hybridization of the probe to the nucleotide sequence of the TBC-I gene 
comprising an allele of a biallelic marker which is indicative of prostate cancer, of an earty onset of 
prostate cancer, of a susceptibility to prostate cancer, of the level of aggressiveness of prostate 
cancer tumors, of a modified expression of the TBC-1 gene, of a modified production of the TBC-1 
protein, or of the production of a modified TBC-1 protein, and 

- contacting the hybrid thus formed with a labeled hybridization probe, and 

- revealing the formation of a hybrid comprising the hybridization probe, a nucleotide sequence 
from the biological sample and the labeled probe. 

In a prefen-ed embodiment, the biallelic marker comprises the sequences of SEQ ID 7-8. 

Preferably, the nucleotide sequences of the biological sample are amplified prior to hybridization 
using one of the primers described herewith. 

The invention also specifically relates to a method of determining whether an individual suffering 
from prostate cancer or susceptible of developing prostate cancer is likely to respond positively 
to treatment with a selected medicament acting against prostate cancer. 

The method comprises the following steps: 

- obtaining a DNA sample from the individual to be tested, and 

- analyzing said DNA sample to determine whether it comprises alleles of one or more biallelic 
markers associated with a positive response to treatment with the medicament and/or alleles of 
one or more biallelic mariners associated with a negative response to treatment with the 
medicament 
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Vectors and host cells 

A further object of the present invention is a recombinant vector for the expression of an 
heterologous polynucleotide, wherein said vector comprises a nucleic acid comphsing a 
nucleotide sequence of SEQ ID No 2. or biologically active nucleotide fragments and variants 
thereof. The heterologous polynucleotide codes either for a desired polypeptide of interest or for 
a nucleic acid, for example a sense or an antisense DNA molecule. 

In a specific embodiment, a recombinant vector of the invention comprises a nucleic acid 
comprising a polynucleotide encoding a human TBC-1 protein or a fragment or variant thereof. 



Another recombinant vector of the invention comprises a nucleic acid comprising a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 
and 4, or a regulatory sequence contained therein or the TBC-1 coding sequence contained 
therein, or fragments or variants thereof. Preferred nucleic acid fragments or variants consist of 
1 5 biologically active fragments or variants thereof. 

TBC-1 polypeptides 

The invention also concerns a purified or isolated TBC-1 polypeptide encoded by a nucleic acid 
carrying the TBC-1 gene, a TBC-1 cDNA. or a fragment or a variant thereof. 



More particularly, the invention also relates to a purified or isolated polypeptide that is encoded 
by a nucleic acid selected from the group consisting of SEQ ID Nos 3 or 4 or a fragment or a 
variant thereof. 

25 The inventiori deals also with a purified or isolated TBC-1 polypeptide that is encoded by a 
nucleic acid" comprising a biallelic marker according to the present invention. In one 
embodiment, the invention concerns a purified or isolated TBC-1 polypeptide that is encoded by 
a nucleic acid comprising a nucleotide sequence selected from the group consisting of SEQ ID 
Nos 7-8. 



30 



More particularly, the invention relates to a purified or isolated TBC-1 polypeptide comprising an 
aminoacid sequence of SEQ ID No 5 or a fragment or a variant thereof. 
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Antibodies 

The invention also concerns a purified or isolated antibody which is capable of specifically 
binding to the TBC-1 protein comprising the amino acid sequence of SEQ !D No 5. 

The invention also deals with methods and i<its for detecting the presence of the polypeptide 
comprising the amino acid sequence SEQ ID No 5 in a test sample. 

The method particularly comprises contacting a test sample suspected of containing the amino 
acid sequence of SEQ ID No 5 with an antibody of the invention. 

The kit comprises an antibody of the invention and preferably means for revealing the fomiation 
of an antigen-antibody complex. 

Complementary polynucleotides 

For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a 
second polynucleotide when each base in the first polynucleotide is paired with its complementary 
base. Complementary bases are, generally, A and T (or A and U), or C and G. 

Diagnostic kits 

Another object of the invention consists of diagnostic kits for detecting the presence of at least 
one copy of a TBC-1 DNA in a test sample, said kits containing a phmer, a pair of primers or a 
probe according to the invention. 

The invention also relates to diagnostic kits useful for detemnining the presence in a DNA 
sample of alleles associated with prostate cancer, with an early onset of prostate cancer, with 
the level of aggressiveness of prostate cancer tumors, with a modified expression of the TBOI 
gene, with a modified production of the TBC-1 protein, or with the production of a modified TBC- 
1 protein. 

In a first embodiment, the kit comprises primers such as those described above, preferably 
fonward and reverse primers which are used to amplify the TBC-1 gene, particularly the genomic 
sequence of SEQ ID No 1, or a fragment thereof. In some embodiments, at least one of the 
primers is complementary to a nucleotide sequence of the TBC-1 gene comprising a biallelic 
marker associated with prostate cancer, with an eariy onset of prostate cancer, with the level of 
aggressiveness of prostate cancer tumors, with a modified expression of the TBC-1 gene, with a 
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modified production of the TBC-1 protein, or witli the production of a modified TBC-1 protein. In 
one embodiment, the biallelic marker comprises one of the sequences of SEQ ID Nos 7-8. 

In a second embodiment, the kit comprises microsequencing primers, wherein at least one of 
said primers is an oligonucleotide capable of hybridizing, either with the coding or with the non- 
coding strand, immediately upstream of the polymorphic base of a biallelic marker, preferably 
the biallelic marker consisting of the group of nucleotide sequences of SEQ ID Nos 7-8, and 
most preferably the nucleotide sequences of SEQ ID Nos 11-12. Other primers that are part of 
the invention and which may be included in this kit are primers comprising the nucleotide 
sequences of SEQ ID Nos 13 and 14. 



In 



a third embodiment, the kit comprises a primer which is complementary to any nucleotide 
sequence of the TBC-1 gene, and particularly of the genomic sequence of SEQ ID No 1, and is 
used to amplify the TBC-1 gene or a fragment thereof contained in the nucleic acid sample to be 
tested which includes a polymorphic base of at least one biallelic marker. Preferably, the 
amplified region includes a polymorphic base of at least one biallelic marker consisting of the 
nucleotide sequences selected from the group consisting of SEQ ID nos 7-8. 

In a fourth embodiment, the kit comprises a hybndization DNA probe, that is or eventually 
becomes immobilized on a solid support, which is capable of hybridizing with the TBC-1 gene, 
particularly with the genomic sequence of SEQ ID No 1 , or fragment thereof, preferably which is 
capable of hybridizing with a region of the TBC1 gene which comprises an allele of a biallelic 
mariner associated with prostate cancer, with an early onset of prostate cancer, with a 
susceptibility to prostate cancer, with the level of aggressiveness of prostate cancer tumors, 
with a modified expression of the TBC-1 gene, with a modified production of the TBC-1 protein, 
or with the production of a modified TBC-1 protein. The techniques for immobilizing a nucleotide 
primer or probe on a solid support are well-known to the skilled person and include, but are not 
limited to the immobilization techniques desaibed in the present application. In a preferred 
embodiment, the probe is selected from the group consisting of SEQ ID Nos 7-8 or fragment 
thereof. 

The kits of the present invention can also comprise optional elements including appropriate 
amplification reagents such as DNA polymerases when the kit comprises primers, reagents 
useful in hybridization reactions and reagents useful to reveal the presence of a hybridization 
reaction between a labeled hybridization probe and the TBC-1 gene containing at least one 
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biallelic marker. In one embodiment, the biallelic marker comprises one of the sequences of 
SEQ ID Nos 7-8. 

Treatment of prostate cancer 

The invention also concerns a method for the treatment of prostate cancer comprising the 
following steps: 

- selecting an individual whose DNA comprises an allele of a biallelic marker or of a group of 
biallelic markers, preferably markers of the TB01 gene, associated with a susceptibility to 
prostate cancer; 

- monitoring in said individual the appearance and optionally the development of a tumor in the 
prostate; and 

- administering an effective amount of a medicament acting against prostate cancer to said 
individual at an appropriate stage of the prostate cancer. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

As used herein, the term "susceptibility to prostate cance.^ is intended to designate a strong 
likelihood for an individual to develop in his or her lifetime a fomn of prostate cancer. This 
likelihood is believed to be strongly related to the association established between the biallelic 
markers of the present invention and prostate cancer or other more specific characteristics 
which can lead to the development of the prostate cancer such as the modified expression of 
the TBC-1 gene, the modified production of the TBC-1 protein or the production of a modified 
TBC-1 protein. 

The temi « treatment of prostate cancer » when used herein is intended to designate the 
administration of substances either for prophylactic or curative purposes. When administered for 
prophylactic purposes, the treatment is provided in advance of the appearance of biologically or 
clinically significant cancer symptoms. When administered for curative purposes, the treatment 
is provided to attenuate the pathological symptoms of prostate cancer, to decrease the size or 
growth of cancer tumors or metastases or to remove them. 

The prophylactic administration of a treatment serves to prevent, attenuate or inhibit th6 growth 
of cancer cells. 
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Another embodiment of the present invention is a method for the treatment of prostate cancer 
comprising the following steps: 

- selecting an individual whose DNA comprises an allele of a biallelic marker or of a group of 
biallelic markers, preferably markers of the TBC-1 gene associated with a susceptibility to 
prostate cancer, 

- administering to said individual, preferably as a preventive treatment of prostate cancer, an 
effective amount of a medicament acting against prostate cancer. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

in a further embodiment, the present invention concerns a method for the treatment of prostate 
cancer comprising the following steps; 

- selecting an individual whose DNA comprises an allele of a biallelic marker or of a group of 
biallelic markers, preferably markers of the TBC-1 gene, associated with a susceptibility to 
prostate cancer; 

- administering to said individual, as a preventive treatment of prostate cancer, an effective 
amount of a medicament acting against prostate cancer; 

- monitoring in said individual the appearance and optionally the development of a tumor in the 
prostate; and, if required 

- administering an effective amount of a medicament acting against prostate cancer to said 
individual at an appropriate stage of the prostate cancer. 

in some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

The present invention also concerns a method for the treatment of prostate cancer comprising 
the following steps: 

- selecting an individual suffering from a prostate cancer and whose DNA comprises an allele of 
a biallelic marker or of a group of biallelic markers, preferably markers of the TBC-1 gene, 
associated with an aggressive form of prostate cancer tumors; and 

- administering an effective amount of a medicament acting against prostate cancer to said 
individual. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. In particular embodiments, the individual is selected by genotyping one or more 
biallelic markers of the present jnvention. 
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The invention also concerns a method for the treatment of prostate cancer in a selected 
population of individuals. The method comprises : 

- selecting an individual suffering from prostate cancer and whose DNA comprises an allele of a 
biallelic marker or of a group of biallelic markers, preferably markers of the TB01 gene, 

5 associated with a positive response to treatment with an effective amount of a medicament 
acting against prostate cancer, and/or whose DNA does not comprise an allele of a biallelic 
marker or of a group of biallelic markers, preferably markers of the TBC-I gene, associated with 
a negative response to treatment with said medicament; and 

- administering at suitable intervals an effective amount of said medicament to said selected 
10 individual. 

in some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. In particular embodiments, the individual is selected by genotyping one or more 
biallelic markers of the present invention. 

15 In the context of the present invention, a "positive response" to a medicament can be defined as 
comprising a reduction of the symptoms related to the disease or condition to be treated. 

In the context of the present invention, a "negative response" to a medicament can be defined 
as comprising either a lack of positive response to the medicament which does not lead to a 
20 symptom reduction or to a side-effect observed following administration of the medicament. 

Methods for screening candidate substances or molecules of interest. 

Another object of the present invention consists of methods and kits for the screening of 
candidate substances that interact with the TBC-1 protein described herein as well as methods 
25 and kits for the screening of substances that are able to modulate the expression of the TBC-I 

I 

gene. 

1. Substances or molecules interacting with the TBC-1 protein according to the 
invention. 

The present invention pertains to methods for screening substances or molecules that interact with 
30 the TBC-1 protein or one peptide fragment or variant thereof. 

In a first screening method embodiment, such a method for the screening of a candidate 
substance comprises the following steps : 
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a) providing a polypeptide comprising tlie amino acid sequence SEQ ID No 5. or a peptide 
fragment or a variant thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

5 d) detecting the complexes formed between said polypeptide and said candidate substance. 

In one embodiment of the screening method defined above, the complexes formed between the 
polypeptide and the candidate substance are further incubated in the presence of a polyclonal 
or a monoclonal antibody that specifically binds to the TBC-1 protein of the invention or to the 
1 0 peptide fragment or variant thereof. 

The candidate substance or molecule to be assayed for interacting with the TBC-1 polypeptide 
may be of diverse nature, including, without being limited to, natural or synthetic organic 
compounds or molecules of biological origin such as polypeptides. 

15 

In another embodiment of the present screening method, increasing concentrations of a 
substance competing for binding to the TBC-1 protein with the considered candidate substance 
is added, simultaneously or prior to the addition of the candidate substance or molecule, when 
performing step c) of said method. By this technique, the detection and optionally the 
20 quantification of the complexes formed between the TBC-1 protein or the peptide fragment or 
variant thereof and the candidate substance or molecule to be screened allows the one skilled 
in the art to determine the affinity value of said substance or molecule for said TBC-1 protein or 
the peptide fragment or variant thereof. 

25 The invention also pertains to kits useful for performing the hereinbefore described screening 
method. Preferably, such kits comprise a TBC-1 protein having the amino acid sequence of 
SEQ ID No 5 or a peptide fragment or a variant thereof, and optionally means useful to detect 
the complex fomied between the TBC-1 protein or its peptide fragment or variant and the 
candidate substance. In a preferred embodiment the detection means consist in monoclonal or 

30 polyclonal antibodies directed against the TBC-1 protein or a peptide fragment or a variant 
thereof. 
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2. Substances or molecules modulating the expression of the TBC-1 gene. 

The present invention concerns also a method for screening substances or molecules that are able 
to increase, or in contrast to decrease or even suppress the expression of the TB01 gene. Such a 
method may allow the one skilled in the art to select substances exerting a regulating effect on the 
expression level of the TBC-I gene and which may be useful for expressing a desired 
polynucleotide, for example specifically in a prostate tissue. 

The expressions "gene expression" or "gene expression level" according to the invention are 
intended to designate gene activity both at the transcriptional and post-transcriptional level. 

The invention also pertains to kits useful for perfomiing the hereinbefore described screening 
method. Preferably, such kits comprise a recombinant vector that allows the expression of a 
nucleic acid comprising a nucleotide sequence of SEQ ID No : 2 or alternatively a recombinant 
cell host containing such a recombinant vector. 

Thus, is also part of the present invention a method for screening of a candidate substance or 
molecule that modulates the expression of the TBC-1 gene according to the invention, wherein 
this method comprises the following steps : 

a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises the nucleotide sequence of SEQ ID No 2 or a biologically active fragment or 
variant thereof, the nucleotide sequence of SEQ ID No 2 or its biologically active fragment or 
variant being operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance, and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
pol^^nucleotide encoding the detectable protein. 

In a prefen-ed embodiment of the above screening method, the nucleic acid comprising the 
nucleotide sequence of SEQ ID No 2 or a biologically active fragment or variant thereof also 
includes a 5'UTR region of one of the TSC-t cDNAs of SEQ ID Nos 3 and 4. or one of their 
biologically active fragments or variants thereof. ^ 

The invention also pertains to kits useful for performing the hereinbefore described screening 

method. Preferably, such kits comprise 

a) a recombinant vector that allows the expression of a nucleic acid comprising a nucleotide 
sequence of SEQ ID No 2 or a biologically active fragment or variant thereof; 



-25- 



b) a polynucleotide encoding a detectable protein which is operably linked to the nucleotide 
sequence of SEQ ID No 2 or a biologically active fragment or variant thereof. 



A second method for the screening of a candidate substance or molecule that modulates the 
expression of the TBC-I gene comprises the following steps : 

a) providing a recombinant cell host containing a nucleic acid, wherein said nucleic acid 
comprises a 5'UTR sequence of one of the TBC-I cDNAs of SEQ ID Nos 3 and 4. or one of 
their biologically active fragments or variants, the 5'UTR sequence or its biologically active 
fragment or variant being operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance, and 

c) determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

In a preferred embodiment of the screening method described above, the nucleic acid that 
comprises a nucleotide sequence selected from the group consisting of the 5'UTR sequence of 
one of the TBC-I cDNAs of SEQ ID Nos 3 and 4 or one of their biologically active fragments or 
variants, includes a promoter sequence, wherein said promoter sequence can. be either 
endogenous, or in contrast exogenous with respect to the TBC-I 5'UTR sequences defined 
therein. 

A preferred endogenous promoter sequence consists in a purified or isolated nucleic acid 
comprising the TBC-1 regulatory sequence of SEQ ID No 2, or a biologically active fragment or 
variant thereof. 

Among the prefen-eiii polynucleotides encoding a detectable protein, there may be cited 
polynucleotides encoding beta galactosidase, green fluorescent protein (GFP) and 
chloramphenicol acetyl transferase (CAT). 

In another embodiment of a screening method according to the invention, the polynucleotide 
encoding a detectable protein is selected from the group consisting of the nucleotide sequences of 
SEQ ID Nos 3 and 4, and coding fragments or variants thereof. 

The invention also pertains to kits useful for perfonning the hereinbefore described second 
screening method. Preferably, such kits comprise 
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a) a recombinant vector that comprises a nucleic acid including a 5'UTR sequence of one of the 
TBC-1 cDNAs of SEQ ID Nos 3 and 4, or one of their biologically active fragments or variants, 
the 5'UTR sequence or its biologically active fragment or variant being operably linked to a 
polynucleotide encoding a detectable protein. 

Preferably, the regulatory sequence contained in the recombinant vector described above is 
located upstream the polynucleotide encoding a detectable protein. 

For the design of suitable recombinant vectors useful for performing the screening methods 
described above, it will be referred to the section of the present specification wherein the prefenred 
recombinant vectors of the invention are detailed. 

Variants and fragments 
1. Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a 
reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as 
a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such 
non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, 
including those applied to polynucleotides, cells or organisms. Generally, differences are limited so 
that the nucleotide sequences of the reference and the variant are closely similar overall and, in 
many regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to. nucleotide 
sequences which are at least 95% identical to a polynucleotide selected from the group consisting 
of the nucleotide sequences of SEQ ID Nos 1-4 or to any polynucleotide fragment of at least 8 
consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide 
sequences of SEQ ID Nos land 2 , and preferably at least 99% identical, more particularly at least 
99 5% identical, and most preferably at least 99.8% identical to a polynucleotide selected from the 
group consis^ng of the nucleotide sequences of SEQ ID Nos 1-4 or to any polynucleotide fragment 
of at east 8 consecutive nucleotides of a polynucleotide selected from the group consisting of the 
nucleotide sequences of SEQ ID Nos 1-2. 
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Changes in the nucleotide of a variant may be silent, which means that they do not alter the amino 
acids encoded by the polynucleotide. 

However nucleotide changes may also resurt in amino acid substitutions, additions, deletions, 
5 fusions and truncations in the polypeptide encoded by the reference sequence. The substitutions, 
deletions or additions may involve one or more nucleotides. The variants may be altered .n coding 
or non^oding regions or both, atterations in the coding regions may produce conservative or non- 
conservative amino acid substitutions, deletions or additions. 

,0 in the context of the present invention, particularly preferred embodiments are those in which the 
polynucleotides encode polypeptides which retain substantially the same biological function or 

activity as the mature TBC-1 protein. 

A polynucleotide fragment is a polynucleotide having a sequence that entirely is the same as part 
1 5 but not all of a given nucleotide sequence, preferably the nucleotide sequence of the TBC-1 gene, 
and variants thereof. The fragment can be a portion of the regulatonr sequences of the TBC1 

gene. 

such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or they 
20 may be comprised within a single larger polynucleotide of which they fomi a part or region. 
However, several fragments may be comprised within a single larger polynucleotide. 

AS representative examples of polynucleotide fragments of the invention, there may be mentioned 

25 100 or 100 to 200 nucleotides in length. 
2. Polypeptides. 

The invention also relates to variants, fragments, analogs and derivatives of the polypeptides 

described herein. 

The variant may be 1) one in which one or more of the amino acid residues are substMed w.h a 
conserved or non-conse,ved amino acid residue (preferably a consented amino acid residue) and 
such substrtuted amino acid residue may or may not be one encoded by the genetic code, or 2) 
one in which one or more of the amino acid residues includes a substKuent group, or 3) one .n 
which the T8C-1 protein of the invention is fused wrth another compound, such as a compound to 
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increase the haff-life of the polypeptide (for example, polyethylene glycol), or 4) one in which the 
additional amino acids are fused to the TBC-1 protein of the invention under consideration, such as 
a leader or secretory sequence or a sequence which is employed for purification of this TBC-1 
protein or a preprotein sequence. Such variants are deemed to be within the scope of those skilled 
in the art. 

A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part but not 
all of a given polypeptide sequence, preferably a polypeptide encoded by the TBC-1 gene and 
nucleotide variants thereof. Prefen-ed fragments include those regions possessing antigenic 
properties and which can be used to raise antibodies against the TBC-1 protein. 

Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or they may 
be compnsed within a single larger polypeptide of which they fomi a part or region. However, 
several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be mentioned 
those which have from about 5, 5, 7, 8, 9 or 10 to 15. 10 to 20. 15 to 40. or 30 to 55 amino acids 
long. Prefen-ed are those fragments containing at least one amino acid mutation in the TBC-1 
protein. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 : 

A) Partial T8C-7 genomic sequence, first cDNA molecule and associated coding sequence 
(CDS). 

i 

B) Partial TBC-I genomic sequence, second cDNA molecule and associated coding sequence 
(CDS). 

Figure 2 : An amino acid alignment of a portion of the amino acid sequence of the TBC-1 
protein of SEQ ID No 5 with other proteins sharing amino acid homology with TBC-1. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention concerns polynucleotides and polypeptides related to the human TBC-I 
gene (also termed "T6C-t gene" throughout the present specification) . which is potentially 
involved in the regulation of the differentiation of various cell types in mammals. A deregulation 
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^ TBC-1 protein ma, be Invohred in the generation of a pa,holo,.a, slate related to ce« 
d«.mialion a palieni, more part^ulaHv .0 abnormal oeU prdi.era.ion leadin, .0 canc« 

States, such as prostate cancer. 

T.e .en„r,c=.ion 0, .enes ,nvo^ed in a pa«icu,ar ,ra« 3u=n as ^-^^ 

b, carried out Wrough Mo main strategies currently used .or genetio mapping . linkage 

:, s nTassocia.ion studies. Linkag ,sis reouires .be stud, of families »,b 

individuals and is now useful in ,be delecUon of mono- or oligogenic inbente d-..^^ 
converse,, association studies examine .be f-equenc, o. madder alleles ,n un^lated .ra 
"or:) ind..dua,s compared »itb trait negative ,T-, controls, and are genera., employed 
in the detection of polygenic inheritance. 

ranHlHate reoion on the chromosome 4 (linkage analysis). 

:or::".i.e.bepros.ate cancer gene.,s.a.„g.rom,ami,ies.as,s.m.ic,-i.^^ 

. using mar..~ 
Genethon laboratory by the Jean We.ssenbach team (Dib et al., 1996). 
is incorporated herein by reference. 

Tbe studies o, genetic link or o, -linkage- are based on .be principle according to .*icb « 
.r,no seauences on a cbromosome do no. present (or ver, rarei, pres.nl) 
ToltiZonrr-*g-over dubng meiosis. To do .bis. microsatellite 0«A seguenc. 
Zmrrmarkers, coLantl, co-in.en.ed „itb tbe disease studted are searcbed ,or . a 
1, baving a predisposition for Ibis disease. Tbese DNA seouencas organ.ed ,n tbe fomr 0 

markers used, according to tbe genealogical tree, tbe transm,ss,on of tbe *«ase and 
iTm^l of tbo markers. Tbus if a particular allele of a given marker is •-ns-.te^ «^ 
rearie o«en .ban cbance would bave it ,recomblna«on level 0. between 0 an^ 0.5 * 
,a plX .0 deduce ,Ka. .be ta^et gene in question is .ound in tbe n.gbborbood of tbe marker. 



-30- 



Using this technique, it has been possible to localize several genes of genetic predisposition to 
familial cancers, in order to be able to be included in a genetic link study, the families affected 
by a hereditary form of the disease must satisfy the "informativeness" aiteria: several affected 
subjects (and whose constitutional DNA is available) per generation, and at best having a large 
5 number of siblings. 

By linkage analysis, the inventors have identified a candidate region for prostate cancer on 
chromosome 4. Indeed, the LOD scores at 2 points between the disease and the markers on a 
total population of approximately fifty families present a value of 2.49 for marker D4S398 which 
10 indicates a probable genetic link with this marker. The curve of the variation of the LOD score 
on a map of 5 markers is centered on D4S398 and the value higher than 3.3 indicates that a 
gene involved in familial prostate cancer is probably found in the region located between 
markers D4S2978 and D4S3018, or a space of approximately 9.7 cM. 

1 5 The inventors have subsequently established a physical map of the candidate region. Namely 
they have covered the candidate region with a group of cloned and ordered genomic DNA 

fragments. 

Then biallelic markers located on the human chromosome 4 candidate region described above 
20 have been generated by designing suitable primers able to amplify different nucleot.de 
sequences contained in the genomic candidate region using a pool of DNAs from individuals or 
using a collection of separate DNA samples from individuals. The amplified nucleic acds are 
then sequenced in order to identify polymorphisms between individuals, and more preferably 
single nucleotide polymorphisms (SNPs) that define biallelic markers. 

As it will be described in detail further in this specification, the inventors have found that the 
presence of one among the two alleles of a novel particular biallelic marker (namely, marker 99- 
430-352) located in the chromosome 4 candidate region described hereinbefore Is statistically 
closely related to the occurrence of prostate cancer in individuals harboring this allele of biallelic 
marker 99^30-352 within their genome, and particulariy with sporadic prostate cancer cases. 

Based on the linkage analysis and on the results of the association between the biallelic marker 
99-430-352 and prostate cancer described above, the chromosome 4 genomic region 
surrounding this biallelic marker has been suspected by the inventors to contain a genetic 
determinant involved in prostate cancer, either as a causal determinant of the disease or as a 
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co-factor involved in the development of the malignancy state leading to prostate cancer. In 
order to characterize this genetic determinant, the inventors have sequenced the genomic 
region surrounding b.allelic marker 99-430-352. The sequencing of this genomic region have 
allowed the inventors to identify a novel human gene, the nucleotide sequence of which harbors 
biallelic marker 99-430-352. 

Homologies of the novel human gene translation product with a known murine protein. 
Database homology searches have allowed the inventors to determine that the trans.at^n 
product of this novel human gene has significant identity with a murine protein called tbc1. The 
novel human gene of the invention has thus been called TBC-1 throughout the present 
soecification TBCI comprises an open Reading frame that encodes a novel protein, the TBC-1 
pLin Based on sequence similarity, an alignment of a portion of the TBC-1 amino acd 
sequence with the known tbc1 murine protein, it is expected that TBC1 protein may play a role 
in the cell cycle and in differentiation of various tissues. Indeed, the TBC1 protein contains a 
200 amino acid domain called the TEC domain that is homologous to regions in the tre2- 
oncogene and in the yeast regulators of mitosis BUB2 and cdc16. 

The cDNA of the murine ltd gene has been described in US Patent No US 5.700.927 and H 
encodes a putative protein product of 1141 amino acids. The N-tem.inus of the munne tbc1 
protein contains stretches of cysteines and histidines which may form zinc f.nger stn^ctures in 
he mature polypeptides. The N-terminus also comprises short stretches of basic am.no acids 
Which may be involved in a nuclear localization signal. Tne T3C domain ^ 
protein contains several tyrosine residues which are conserved in BUB2 and cdc16. The C- 
terminus of the murine tbcl protein contains a long stretch of evenly spaced leucine res.dues 
which are susceptible to form f leucine zipper motif. 

T^e mu*e »c, .as .sen s.own .0 ^ hl.h„ expressed in tesfe ~ 
l„»er «els of expression have also be idanlffled in lung, spleen, brain, and head. Moreover. 
,„urine tbc1 is a nuclear protein which is expressed in a cell- and s.age.speci«c manner. 

Studies of munne bone marrow have demonstrated that erythroki cells and megaKaryooytes 
Zssed substant^l levels 0, the mu.ne tbc, protein, bu. none was detected ,n manure 
„e trophils. Similan,. spem,atogonia do not expose muhne but phmar, an se^ndar, 
sperm tocytes express abundant »cl later in the di«erenBa,ion of the germ cells, the t»c, 
S appear to decrease in spe^atids and ac^e sperm. The dif,eren«,on program of 
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spermatogonia to spermatocytes therefore involves a significant upreguiation of murine tbc1 
expression. 

The general distribution of murine tbc1 is not tissue-specific, but is cell-specific within individual 
5 tissues and intimately linked to tissue differentiation. The developmental expression of murine 
ibc1, particularly in hematopoietic and germ cells, suggests that this gene plays a role in the 
terminal differentiation program of several tissues. 

Consequently, an alteration in the expression of the TBC-1 gene or in the amino acid sequence 
10 of the TBC-1 protein leading to an altered biological activity of the latter is likely to cause, 
directly or indirectly, ceil proliferation disorders and thus diseases related to an abnormal cell 
proliferation such as cancer, particularly prostate cancer. 

A. TBC-1 GENOMIC SEQUENCES, cDNAs AND TBC-1 REGULATORY 
1 5 POLYNUCLEOTIDES. 

The invention concerns a purified or isolated nucleic acid encoding a TBC-1 polypeptide as well 
as a nucleic acid complementary thereto and a fragment or variant thereof. 

TBC'1 genomic sequences 

20 The inventors have sequenced a portion of the TBC-1 genomic sequence. A map of the partially 
sequenced T8C-1 gene is depicted in the upper line of Figure l-A) and 1-B). The upper line of 
Figure 1-A) and 1-B) shows the respective locations of the first three axons of the TBC-I gene, 
designated respectively as Exon 1, Exon ^bis and Exon 2. The position of the first nucleotide at 
the 5'-end of each exon is also indicated, the nucleotide at position 1 being the first nucleotide 

25 at the 5'-end of the polynucleotide of SEQ ID No 1 , 

More precisely, the structural characteristics of the TBC-I genomic sequence, as represented in 
Figure 1 , are as follows : 

a) a regulatory region located between the nucleotide at position 1 and the nucleotide at position 
30 1999 of SEQ ID No 1. 

b) a transcribed region, located between the nucleotide at position 2000 and the nucleotide at 
position 17589 of SEQ ID No1, this coding region comprising Exon 1, Exon ^bis and Exon 2 of 
the TSC-I gene. 
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Exon 1 starts at the nucleotide in position 2000 and ends at the nucleotide in position 2076 of 
the nucleotide sequence of SEQ ID ' : ' 

Exon ^bis starts at the nucleotide in position 12291 and ends at the nucleotide in position 12372 
5 of the nucleotide sequence of SEQ ID No 1 . 

Exon 2 starts at the nucleotide in position 12739 and ends at the nucleotide 13248 of the 
nucleotide sequence of SEQ ID No 1. 

10 The translation start site ATG is located within Exon 2 at nucleotide positions 12832-12835 of 
the nucleotide sequence of SEQ ID No 1 . 

The TBC-1 introns defined hereinafter for the purpose of the present invention are not exactly 
what is generally understood as "introns" by the one skilled in the art and will consequently be 
15 further defined below. 

Generally an intron is defined as a nucleotide sequence that is present both in the genomic 
DNA and in the unspliced mRNA molecule, and which is absent from the mRNA molecule wh.ch 
has already gone through splicing events. In the case of the TBC-1 gene, the inventors have 
found that at least two different spliced mRNA molecules are produced when this gene .s 
transcribed as it will be described in detail in a further section of the specification. The first 
spliced mRNA molecule comprises Exons 1 and 2 as shown in Figure 1-A). Thus, the genomic 
nucleotide sequence comprised between Exon 1 and Exon 2 is an intronic sequence as regards 
to this first mRNA molecule, despite the fact that this intronic sequence contains Exon ^b,s. In 
contrast. Exon ^bis is of course an exonic nucleotide sequence as regards to the second TBC-1 
mRNA molecule shown in Figure 1-B). 

For the purpose of the present invention and in order to make a clear and unambiguous 
designation of the different nucleic acids encompassed, it has been postulated that the 
polynucleotides contained both in the nucleotide sequence of SEQ ID No 1 and in any of the 
nucleotide sequences of SEQ ID Nos 3 or 4 are considered as exonic sequences. Conversely, 
the polynucleotides contained in the nucleotide sequence of SEQ ID No 1 and located between 
Exon 1 and Exon 2. but which are absent both from the nucleotide sequence of SEQ ID No 3 
and from the nucleotide sequence of SEQ ID No 4 are considered as intronic sequences. 
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Consequently. Intron 1 of TBC-1 starts at the nucleotide in position 2077 and ends at the 
nucleotide in position 1 2290 of the nucleotide sequence of SEQ ID No 1 . Intron 1 bis starts at the 
nucleotide in position 12373 and ends at the nucleotide in position 12738 of the nucleotide 
sequence of SEQ ID No 1. 

The nucleic acids defining the TBC-1 introns described above, as well as their fragments and 
variants, may be used as oligonucleotide primers or probes in order to detect the presence of a 
copy of the TBC-1 gene in a test sample, or alternatively in order to amplify a target nucleotide 
sequence within the TBC-1 intronic sequences. 

The present invention pertains to a purified or isolated nucleic acid encoding a human TBC-1 
protein, wherein said TBC-1 protein comprises an amino acid sequence of SEQ ID No 5, a 
nucleotide sequence complementary thereto, a fragment or a variant thereof. 

The present invention also concerns a purified or isolated nucleic acid comprising at least 20 
consecutive nucleotides of the nucleotide sequence of SEQ ID No 1 or a sequence 
complementary thereto. Such a nucleic acid may comprise at least 25, 30, 40 or 50 consecutive 
nucleotides of the nucleotide sequence of SEQ ID No 1 . 

Another object of the invention relates to a purified or isolated nucleic acid comprising at least 8 
consecutive nucleotides of an intronic sequence of the TBC-1 genomic sequence of SEQ ID No 
1, and particularly of Intron 1 and Intron ^bis as defined above. 

Thus, the invention also deals with A nucleic acid probe or primer comprising at least 8 
consecutive nucleotides of a polynucleotide selected from the group consisting of : 

a) the nucleotide sequence beginning at the nucleotide in position 2077 and ending at the 
nucleotide in position 12290 of the nucleotide sequence of SEQ ID No 1 (Intron 1); 

b) the nucleotide sequence beginning at the nucleotide in position 12373 and ending at the 
nucleotide in position 12738 of the nucleotide sequence of SEQ Id No 1 (Intron ^bis). 



T8C-1 cDNAs 

The first three exons of the TBC-1 gene are represented in the upper line of Figure 1 A) and B): 
- Exon 1 spans from the nucleotide at position 2000 to the nucleotide at position 2076 of the 
nucleotide sequence of the nucleotide sequence of SEQ ID No 1; Exon 1 spans from the 
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nucleotide in position 1 to the nucleotide in position 76 of the nucleotide sequence of SEQ ID No 
3; 

- Exon Ibis spans from the nucleotide at position 12291 to the nucleotide at position 12372 of 

the nuciectiae saquance cv S£Q ,Mo £..'c.-. ipans ,rom -c ,-. .os.tion 1 tc 

the nucleotide in position 81 of the nucleotide sequence of SEQ ID No 4; 

- Exon 2 spans from the nucleotide at position 12739 to the nucleotide at position 13248 of the 
nucleotide sequence of SEQ ID No 1; Exon 2 spans from the nucleotide in position 77 to the 
nucleotide in position 586 of the nucleotide sequence of SEQ ID No 3; Exon 2 spans from the 
nucleotide in position 82 to the nucleotide in position 591 of the nucleotide sequence of SEQ ID 
No 4. 

The inventors have discovered that the expression of the TBC-1 gene leads to the production of 
at least two mRNA molecules, respectively a first and a second TBC-1 transcription product, as 
the results of alternative splicing events. 

The middle line of Figure 1-A) depicts the main structural features of a purified or isolated 
nucleic acid consisting of a TBC-1 cDNA corresponding to a first TBC-1 mRNA molecule that 
has been obtained after reverse transcribing a mRNA generated after transcription of the TBC-1 
gene. This mRNA has a nucleotide length of about 4 kilobases. 

The first transcription product comprises Exons 1 and 2. This cDNA of SEQ ID No 3 includes a 
5 -UTR region, spanning the whole Exon 1 and part of Exon 2. This 5'-UTR region starts from 
the nucleotide at position 1 and ends at the nucleotide at position 170 of the nucleotide 
sequence of SEQ ID No 3. The cDNA of SEQ ID No 3 includes a 3'-UTR region starting from 
the nucleotide at position 3729 and ending at the nucleotide at position 3984 of the nucleotide 
sequlT^ce of SEQ ID No 1. This first transcription product harbors a polyadenylation site' located 
between the nucleotide at position 3942 and the nucleotide at position 3947 of the nucleotide 
sequence of SEQ ID No 3. The ORF encoding the TBC-1 protein is comprised between the 
nucleotide in position 171 and the nucleotide in position 3728 of the nucleotide sequence of 
SEQ ID No 3. 

The middle line of Figure 1-B) depicts the main structural features of a purified or isolated 
nucleic acid consisting of a TBC-1 cDNA corresponding to a second TBC-1 mRNA molecule. 
The 5-end sequence of this second TBC-1 mRNA. more particularly the nucleotide sequence 
comprised between the nucleotide in position 1 and the nucleotide in position 458 of the nucleic 
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acid of SEQ ID No 4 molecule corresponds to the nucleotide sequence of a 5'-EST that has 
been obtained from a human pancreas cDNA library and characterized following the teachings 
of the PCT Application No WO 96/34981. This 5 -EST is also part of the invention and is defined 
by the nucleotide sequence of SEQ ID No 6. 

The second TBC-1 transcription product comprises Exons ^bis and 2. This cDNA of SEQ ID No 
4 includes a 5'-UTR region starting from the nucleotide at position 1 and ending at the 
nucleotide at position 175 of the nucleotide sequence of SEQ ID No 4. This second cDNA also 
includes a 3''UTR region starting from the nucleotide at position 3734 and ending at the 
nucleotide at position 3989 of the nucleotide sequence of SEQ ID No 4. This second 
transcription product harbors a polyadenylation site located between the nucleotide at position 
3947 and the nucleotide at position 3952 of the nucleotide sequence of SEQ ID No 4.The ORF 
encoding the TBC-1 protein is comprised between the nucleotide in position 176 and the 
nucleotide in position 3733 of the nucleotide sequence of SEQ ID No 4. 

Another object of the invention consists of a puhfied or isolated nucleic acid comprising a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 
and 4 and to nucleic acid fragments thereof. 

Prefen-ed nucleic acid fragments of the nucleotide sequences of SEQ ID Nos 3 and 4 consist in 

polynucleotides comprising their respective Open Reading Frames encoding the TBC-1 protein. 

Other preferred nucleic acid fragments of the nucleotide sequences of SEQ ID Nos 3 and 4 
consist in polynucleotides comprising at least a part of their respective 5'-UTR or 3'-UTR 
regions. 

The invention also pertains to a purified or isolated nucleic acid having at least a 95% of 
nucleotide identity with any one of the nucleotide sequences of SEQ ID Nos 3 and 4, or a 
fragment thereof. 

"Percentage of nucleotide identity" for the polynucleotides of the invention is determined by 
comparing two optimally aligned sequences over a comparison window, wherein the portion of 
the polynucleotide or polypeptide sequence in the comparison window may comprise additions 
or deletions (i.e.. gaps) as compared to the reference sequence (which does not comprise 
additions or deletions) for optimal alignment of the two sequences. The percentage is 
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Genomic sequences lying upstream of the TBC-1 Exons are cloned into a suitable promoter 
reporter vedor, such as the pSEAP-Basic. pSEAP-Enhancer, pPgal-Basic. ppgal-Enhancer, or 
pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, each of these promoter 
reporter vectors include multiple cloning sites positioned upstream of a reporter gene encoding 
a readily assayable protein such as secreted alkaline phosphatase, beta galactosidase, or 
green fluorescent protein. The sequences upstream of the TBCI coding region are inserted 
into the cloning sites upstream of the reporter gene in both orientations and introduced into an 
appropriate host cell. The level of reporter protein is assayed and compared to the level 
obtained from a vector which lacks an insert in the cloning site. The presence of an elevated 
expression level in the vector containing the insert with respect to the control vector indicates 
the presence of a promoter in the insert. If necessary, the upstream sequences can be cloned 
into vectors which contain an enhancer for increasing transcription levels from weak promoter 
sequences. A significant level of expression above that observed with the vector lacking an 
insert indicates that a promoter sequence is present in the inserted upstream sequence. 

Promoter sequences within the upstream genomic DNA may be further defined by constructing 
nested deletions in the upstream DNA using conventional techniques such as Exonuclease III 
digestion. The resulting deletion fragments can be inserted into the promoter reporter vector to 
determine whether the deletion has reduced or obliterated promoter activity. In this way. the 
boundaries of the promoters may be defined. If desired, potential individual regulatory sites 
within the promoter may be identified using site directed mutagenesis or linker scanning to 
obliterate potential transcription factor binding sites within the promoter, individually or in 
combination. The effects of these mutations on transcription levels may be determined by 
inserting the mutations into the cloning sites in the promoter reporter vectors. 

Thus, the minimal size of the promoter of the TBC-1 gene can be determined through the 
measurement of TBC-1 expression levels. For this assay, an expression vector comprising 
decreasing sizes from the promoter generally ranging from 2 kb to 100 bp, with a 3' end which is 
constant, operably linked to TBC-1 coding sequence or to a reporter gene is used. Cells, which 
are preferably prostate cells and more preferably prostate cancer cells, are transfected with this 
vector and the expression level of the gene is assessed. 

The strength and the specificity of the promoter of the TBC-1 gene can be assessed through the 
expression levels of the gene operably linked to this promoter in different types of cells and 
tissues. In one embodiment, the efficacy of the promoter of the TBC-1 gene is assessed in 
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different degrees of malignancy. 
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Is also part of the present invention a purified or isolated 5'-UTR nucleic acid comprising a 
nucleotide sequence located between the nucleotide at position 1 and the nucleotide at position 
170 of SEQ ID No 3. or a biologically active fragment or variant thereof. 

Is also part of the present invention a purified or isolated 5'-UTR nucleic acid comprising a 
nucleotide sequence located between the nucleotide at position 1 and the nucleotide at position 
175 of SEQ ID No 4, or a biologically active fragment or variant thereof. 

Is also part of the invention a purified or isolated 3'-UTR nucleic acid comprising a nucleotide 
sequence located between the nucleotide at position 3729 and the nucleotide at position 3984 
of SEQ ID No 3, or a biologically active fragment or variant thereof; 

Is also part of the invention a purified or isolated 3-UTR nucleic acid comprising a nucleotide 
sequence located between the nucleotide at position 3734 and the nucleotide at position 3989 
of SEQ ID No 4, or a biologically active fragment or variant thereof; 

The 5'-UTR purified or isolated nucleic acids described above may be included in the nucleic 
acid that comprises the nucleotide sequence of SEQ ID No 2 or its biologically active fragments 
and variants. 

Preferred fragments of the nucleic acid of SEQ ID No 2 have a length of about 1000 
nucleotides, more particularly of about 400 nucleotides, more preferably of about 200 
nucleotides and most preferably about 100 nucleotides. 

By a "biologically active fragment or variant" of a TBC-1 regulatory polynucleotide accordirib to 
the present invention is intended a polynucleotide comprising or alternatively consisting in a 
fragment of said polynucleotide which is functional as a regulator/ region for expressing a 
recombinant polypeptide or a recombinant polynucleotide in a recombinant cell host. 

For the purpose of the invention, a nucleic acid or polynucleotide is "functional' as a regulatory 
region for expressing a recombinant polypeptide or a recombinant polynucleotide if said 
regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
translational regulatory infomiation, and if such sequences are "operatively linked" to nucleotide 
' sequences which encode the desired polypeptide or the desired polynucleotide. An operable 
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„*age a linkage in *ch »» ragulatoiv ™.d,ic acid and ,he DNA sequenc sough. ,0 ba 
expressed are linl<ed in such a way as to pemit gene expression. 

More precisely. *»o DNA molecules (such as a polynuCeolide containing a promoter region and 
rZucleotL encoding a desired polypept^e or polynucleotide, are said ,0 he -opera^y 
lln J- i, the nature of the linkage bet»een the two polynucleotides does not (,) resuK ,n »,e 
i , duc«on 0. a trame-shin .utation or ,2, interfere »«h the a«ity o, the polynudeo,.e 
c r^ ntg the promoter to direct the transCption 0, the coding polynucleo«e. T.e pr^^cer 
o" Le luld he operahly linked ,0 a polynucleotide encoding a desired polypeptl* or a 
diL polynucleotide if the promoter is capable o, eftecting transcrtption of the polynudeohde 
of interest. 

,„ order, to identify the relevant «ologicall, active po^nucleotide ~ '^'J^ 
one s^ll in the art will refer to the book o, Sambrook e. al. (Sambrook. J. Fntsch E. F . an- 
T Manlatls 1989. Moleailar cloning: a laboratoT, manual. 2ed. Cold Spnng Harbor Laboratory, 
cold spnng Harbor. New yort<, in order to use a recombinant vector carrying a marker gene 
Z galactosidase. chloramphenic. acetyl transferase, etc.) the express.n 0 whi* - ^ 
detecLd When placed under the control of a biologically act.e der^at.e polynucleotde of SEQ 
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,0 Regulatory («lynucfeotides o, the invenlion may be prepared from any or the nudeotide sequences 
of SEQ°D NO 1 or SEQ ID Nos 3 and 4 by deavige using the salable resthction an^ymas. the 
one skill in the art being guided by the book of Sambrook et al. (! 989), 

Regulatory po,nudao«des may also be prepared by digestion of any o. the nudeottde seguen^s 
25 ofTEQ ID NO 1 or SEQ ID No 4 by an exonuCease er^yme, such as Bal31 fWabiKo at al.. «te 



DNA, 5(4):305-314). 



Those re,u»,y po^nucleotidas can a.o be prepared by chemical synthesis, as dascrib^ 
in the speci^cat^n. when the synthes. of oligonucleotide probes or p^mers . 



30 disclosed. 



The regulatory polynudeotides according to the invention may be advantageous^ part o^a 
Ilan, Zsion vedor that may be ..ed to express a cod.g saguence . a d^d ho. 
cell or host organism. The recombinant axp.ssk,n vectors according to the invanton are 
35 described elsewhere in the specification. 
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The invention also encompasses a polynucleotide comprising : 

a) a nucieic acid comprising a regulatory nucleotide sequence of SEQ ID No 2, or a biologically 
active fragment or variant thereof; 
5 b) a polynucleotide encoding a desired polypeptide or nucleic acid, operably linked to the nucleic 
acid comprising a regulatory nucleotide sequence of SEQ ID No 2, or its biologically active 
fragment or variant. 

c) Optionally, a nucieic acid comprising a 3'-UTR regulatory polynucleotide, preferably a 3'UTR 
regulatory polynucleotide of the invention, 

10 

In a preferred embodiment, a polynucleotide such as disclosed above comprises the nucleic acid of 
SEQ ID No 2 or a fragment, a variant or a biologically active derivative thereof which is operably 
linked to the 5'end of the polynucleotide encoding the desired polypeptide or polynucleotide. 

9' 15 In another embodiment, a polynucleotide such as that described above comprises the nucleic acid 
''l/f of said 3'-UTR sequence or a fragment, a variant or a biologically active derivative thereof which is 

located at the 3' end of the polynucleotide encoding the desired polypeptide or nucleic acid, it being 
understood that a preferred desired nucleic acid consists of a ribonucleic acid useful as antisense 
molecule. 

20 

The desired polypeptide encoded by the above described nucleic acid may be of various nature or 
origin, encompassing proteins of prokaryotic or eukaryotic origin. Among the polypeptides 
expressed under the control of a TBC-1 regulatory region, it may be cited bacterial, fungal or viral 
antigens. Are also encompassed eukaryotic proteins such as intracellular proteins, such as "house 
25 keeping" proteins, membrane-bound proteins, like receptors, and secreted proteins like the 
numerous endogenous mediators such as cytokines. 

The desired nucleic acid encoded by the above described polynucleotide, usually a RNA molecule, 
may be complementary to a TBC-I coding sequence and thus useful as an antisense 
30 polynucleotide. 

Such a polynucleotide may be included in a recombinant expression vector in order to express a 
desired polypeptide or a desired polynucleotide in host cell or in a host organism. Suitable 
recombinant vectors that contain a polynucleotide such as described hereinbefore are disclosed 
35 elsewhere in the specification. 
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mRNA molecules of about 4 kilobases isolated by the inventors 

More precisely the effective TBC1 coding sequence is comprised between the nucleotide at 

rr:;; l ...... . .0... ... . seo .0 mo . and ^^^^^-^^ 

position 176 and the nucleotide at position 3733 of the nucleotide sequence of SEQ ID No 4. 

The invention further provides a purified or isolated nucleic acid comprising a polynucleotide 
Ltr^om the group consisting ofapolynuaeotidecomprisinga^^^^^^^^ 

between the nucleotide at position 171 and the nucleotide at posrtion 3728 of SEQ ID 3. 

variant or fragment thereof or a sequence complementary thereto; 

The above disclosed polynucleotide that contains only coding sequences derived ^--^^ 
TrF lay be expressed in a desired host cel. or a desired host organism, when sa.d po.ynucleot.de 
■ JaLd ler the control of suitable expression signals. Such a polynucleotide, when placed 
under the suitable expression signals, may be inserted in a vector for its expression. 

B. IDENTIFICATION OF BIALLELIC MARKERS 

derived poly^cleotides having between 2 and 100. prefe^bl, between 20. 30. =. 40 an ^ 
andtb-e p e,e,ab,, about 45 nucteCdes In length. »hlch exhibit blallelic pblyn,orpn.,n a. ne 
in Poslton. Each blallelici marKer .here,<.e cb.esponds to <cn^ 0, a 

2:1: ™*a«bn - - P"*" ^-«- 

subslMlon bf one nucleotide for anbthe, (tbr example C Instead of T). 

, Hbwever Ihia nucleotide modification can also invoke an inseition b, a deletion of at least one 
He preferably bet.veen , and 5 nudeotides. Tf-e nucleotide modification can al o 
: vet several adiacen, single base polymorphisms. This type o, nuc^ti e 

dll^n -lly bailed a -variable mbtlf. Generally, a "v^iable mbT Involves the 
Cslt: tb 0 adiaoen, single base polymorphisms. In some instances, series 0, Kvo or 
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more single base polymorphisms can be interrupted by single bases which are not polymorphic. 
This is also globally considered to be a "variable motif. 

Generation ofbiallelic markers 

There are two preferred methods through which the biallelic markers of the present invention 
5 can be generated. In a first method, DNA samples from unrelated individuals are pooled 
together, following which the genomic DNA of interest is amplified and sequenced. The 
nucleotide sequences thus obtained are then analyzed to identify significant polymorphisms. 

One of the major advantages of this method resides in the fact that the pooling of the DNA 
1 0 samples substantially reduces the r.umber of DNA amplification reactions and sequencing which 
must be carried out. Moreover, this method is sufficiently sensitive so that a biallelic marker 
obtained therewith usually shows a sufficient degree of informativeness for conducting 
association studies. 

15 In a second method for generating biallelic markers, the DNA samples are not pooled and are 
therefore amplified and sequenced individually. The resulting nucleotide sequences obtained 
are then also analyzed to identify significant polymorphisms. 

It will readily be appreciated that when this second method is used, a substantially higher 
20 number of DNA amplification reactions must be carried out. It will further be appreciated that 
including such potentially less infomiative biallelic markers in association studies to identify 
potential genetic associations with a trait may allow in some cases the direct identification of 
causal mutations, which may, depending on their penetrance, be rare mutations. This method is 
usually preferred when biallelic markers need to be identified in order to perfomi association 
25 studies within candidate genes. 

In both methods, the genomic DNA samples from which the biallelic markers of the present 
invention are generated are preferably obtained from unrelated individuals corresponding to a 
heterogeneous population of known ethnic background, or from familial cases. 



30 



The number of individuals from whom DNA samples are obtained can vary substantially, 
preferably from about 10 to about 1000, preferably from about 50 to about 200 individuals. It is 
usually preferred to collect DNA samples from at least about 100 individuals in order to have 
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,um*n, pcmo^hlc d^.«, ^ a given popu,a«o„ .o genera-e as many markers as possible 

and to generate statistically significant results. 

A. for the source of the genomic DNA to be subjected to analysis, any test sample can be 
, :::rwr: .y pa.cu.ar ^0. preferred source of genom.c ONA used ,n the 
context of the present invention is the peripheral venous blood of each donor 

The techniques of DNA extraction are v.e..-Known to the skilled technician. Details of a preferred 
embodiment are provided in Example 2. 

'° DMA samples can be pooled or unpooled ,or ,h= amp„«ca,ion step. DNA ampl«on 
. r rl »ell-kno«n ,o those skilled in ,he ar,. Ampli(ica.ion technidues .ha. can be used 
: .err rresen. invention include, bu, are no, ,im»ed .0. .he po„merase chain 
,n .he con.exl of the present Kchnioues such as .he nucleic 

reartlon (PCR, RT-PCR). the ligase chain reaction (LCR), and tecnniqu 
1 5 acid sequence based amplifcation (NASBA). 

.ne PCR technolog, Is the preferred amP.i.catlon ~ ^ - 

V. hopn described in several publications including US Patents 4,ooo, 
lias been descnoea k , /iqqi cold Spring Harbor 

A Qfi^ 188 the publication entitled "PCR Methods and Applications (1991 , Cold 9 
4,965.188. the pub .^^^^^^^^^^^ ^^^^^^^^^ ^ 

20 Laboratory Press) and White (l yy 0- invention is provided 

typical example of a PCR reaction suitable for the purposes of the present 

in Example 2. 

one 0, the aspec. o,.he ^ ^^JT r IC^^^^^^^^^ ^ 

25 particularly the genomic sequence of SEQ ID No 1. or g ^ 
sample, preferably using the PCR techno^gyj^^^^^ ^.^^ 
test sample suspected of containing the target TBC-1 sequence 
amplification reaction reagents comprising a pair of amplification pnmers. 

30 Thus, the present invention also relates to a method -^^^ ^l^^ 
sequence, particulariy of a portion of the genomic sequence of SEQ ID 1. 
variant thereof in a test sample, said method comprising the steps 0 . 

^ .oot^rtino a test sample suspected of containing the targeted TBC1 gene sequen 
" :r there^h am^ation reaction reagents comprising a pair of amplification pnmers 
35 located on either side of the 7BC-1 region to be amplified, and 
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b) detecting the amplification products. 
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,n one specific embodiment of the above amplification method, the amplification primers are 
selected from the group consisting of SEQ ID Nos 9-10 and 13-14. 

,n another embodiment of the above amplification, the amplification product is detected by 
hybridization with a labeled probe having a sequence which is complementary to a region of the 
TBC-1 gene, particularly a region of the genomic sequence of SEQ ID No 1. 

The primers are more particularly characterized in that they have sufficient complementarity with 
any sequence of a strand of the TBCI gene close to region to be amplified, for example with a 
sequence of introns adjacent to exons to amplify. 

The pnmers were defined with OSP software (Hillier & Green. 1991). The length of the first 
primer can range from 10 to 100 nucleotides, preferably from 10 to 50. 10 to 30 or more 
preferably 10 to 25 nucleotides. All primers contained a common upstream oligonucleotide tail 
enabling the easy systematic sequencing of the resulting amplification fragments. 

preferred primers of the invention include the nucleotide sequences of SEQ ID Nos 9-10 and 13-14. 

The amplification products generated as described above with the primers of the invention are 
then sequenced using methods known and available to the skilled technician. Preferably, the 
amplified DNA is subjected to automated dideoxy temiinator sequencing reactions using a dye- 
primer cycle sequencing protocol. Following gel image analysis and DNA sequence extraction, 
sequence data are automatically processed with adequate software to assess sequence quahty. ^ 

A polymorphism analysis software is used that detects the presence of biallelic sites among 
individual or pooled amplified fragment sequences. Polymorphism search is based on the 
presence of superimposed peaks in the electrophoresis pattern. These peaks wh.ch present 
distinct colors correspond to two different nucleotides at the same position on the sequence. 
The polymorphism has to be detected on both strands for validation. 

The biallelic marker 99-430-352 of the present invention is disclosed in Table 2 of Example 4b. 
ns location on the TBC-I gene is indicated as features in SEQ ID No 1 . The pair of amplification 
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primers are listed in the sequence listing as SEQ ID Nos 9-10. these primers allowing the 
amplification of a nucleic acid containing the polymorphic base that defines this bialielic marker. 

In the present invention, the bialielic mariner 99-430-352 referred to above is defined by two 
nucleotide sequences corresponding to oligonucleotides of 47 bases in length comprising at the 
middle either the major or the minor allele of the polymorphic base, listed as SEQ ID Nos 7-8. 

The 99-430-352 bialielic marker is located within Intron 1 of the TBC-1 gene, and forms part of 
the present invention. 

Bialielic markers in linkage disequilibrium with the preferred markers of the Invention 

As mentioned before, once an association has been demonstrated between a given bialielic 
marker and a trait T. the discovery of additional bialielic markers associated to T and in LD with 
one of the bialielic markers disclosed herein can easily be carried out by the skilled person. 

The present invention then also concems bialielic markers in LD with the specific bialielic 
markers described above and which are expected to present similar characteristics in terms of 
their respective association with a given trait. 

Measure of linkage disequilibrium between markers 

LD among a set of bialielic maricers having a heterozygosity rate of ca. 50% can be detemnined 
by genotyping between 50 and 1000 unrelated individuals. Genotyping is perfomied through 
individual reactions as described above. 

LD between any pair of bialielic markers comprising at least one of the bialielic mariners of the 
present invention (Mi.M)) can be calculated for every allele combination (M„.M,i .. Mu.M,^: M^.M, 
and Miz.Mja). according to the Piazza fonnula : 
AM,i,.IVIj,= V04 - V (94 + 03) (94 +92) . where: 

94= - - = frequency of genotypes not having allele k at and not having allele 1 at M, 
03= - + = frequency of genotypes not having allele k at M, and having allele I at Mj 
92= + - = frequency of genotypes having allele k at M, and not having allele I at M, 

The skilled person will readily appreciate that other LD calculation methods can be used without 
undue experimentation. 
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Henca, cnce LD has been demonstrated between a trail and a given biailelic marker, all the 
biailelic markers shown to be in LD with the given biailelic marker are expected to present 
similar characteristics in terms of their respective association with a given trait. These additional 
markers which can be identified and sequenced by the skilled person using the teachings of the 
present application also fall within the scope of the present invention. 

An example of identification of additional biailelic markers associated to a trait based on the 
previous knowledge of the localization of a first marker associated to a given trait is given below. 

B/a//e//c markers in linkage disequilibrium with a particular marker :Apo E4 

The following example relating to the identification of markers in LD with the apoE4 allele ,s 
representative of the procedures of the present invention in which markers in LD with a target 
gene are identified. 3 major isoforms of human apolipoprotein E (apoE2, -E3, and -E4) have 
been identified by isoelectric focusing and are coded for by 3 alleles {e 2, 3, and 4) of the Apo E 
gene. 

AS originally reported by Strittmatter et al. and by Saunders et al. in 1993. the Apo Et4 aUete is 
strongly associated with both late-onset familial and sporadic Alzheimer's Disease (AD). 

Biailelic markers in LD with the Apo E e4 allele were identified. This example is illustrative of the 
general pnnciple that the generation of biailelic markers associated with a trait leads to markers 
in LD with any biailelic marker already known to be associated with the trait. 
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An Apo E marker was used to screen the human genomic BAG library. A BAG. which gave a 
unique hybridization signal on chromosomal region 19q13.2.3 by FISH, was selected for finding 
biailelic markers as follows. 



35 



This BAG contained an insert of 205 kb that was subcloned. Fifty BAG subclones were 
randomly selected and sequenced. Twenty-five subclone sequences were selected and used to 
design twenty-five couples of PGR primers that allowed amplicons of approximately 500 bp to 
be generated. These PGR primers were then used to amplify the corresponding genomic 
sequences in a pool of DNA from 100 individuals (French origin, blood donors) as already 
described Amplification products from pooled DNA were sequenced and analyzed for the 
presence of biailelic polymorphisms using the software described herein. Five amplicons were 
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shown to contain a polymorphic base in the pool of 100 individuals, and therefore these 
polymorphisms (99-366/274; 99-344/439; 99-365/344; 99-359/308; 99-355/219) were selected 
as the random biatlelic markers in the vicinity of the Apo E gene. 



An additional couple of primers was designed that allowed amplification of the genomic 
fragment can7ing the already known polymorphism of Apo E, (99-2452/54 C/T). 

An association study was then performed. As expected, there was a clear association between 
Alzheimer disease (AD) and the known Apo E4 polymorphism (biallelic marker 99-2452/54), the 
C allele frequency being increased in 26 % in the AD case population studied compared to the 
AD control population analyzed (pvalue of this difference = 2x10"^^). 

In addition, the association study with the random markers generated in the variety of the Apo E 
gene showed that the biallelic marker 99-365/344 C/T is also associated to AD, the T allele 
frequency being increased of 17 % in the AD case population respect to the AD control 
population under study (pvalue of this allele frequency difference = 7x10''°). Thus individuals 
who possess a T allele at the biallelic marker 99-365/344 are at risk of developing AD.^ 

Among the biallelic markers generated in the Apo E region. 99-365/344 is in LD with the 
previously known Apo E4 marker 99-2452/54. The LD is detected in a control population (LD 
value = 0.08) and is cleariy increased in the AD case population (LD = 0,21). Hence the 
generated biallelic marker which are associated with Alzheimer's disease, namely the biallelic 
marker 99-365, is in LD with the biallelic marker 99-2452 already known to be associated with 
this disease. 

C- GENOTYPING OF BIALLELIC MARKERS 

The polymorphisms identified above can be further confirmed and their respective frequencies 
can be determined through various methods using the previously described primers and probes. 
These methods can also be useful for genotyping either new populations in association studies 
or individuals in the context of detection of alleles of biallelic markers which are known to be 
associated with a given trait. Those skilled in the art should note that the methods described 
below can be equally performed on individual or pooled DNA samples. 
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Once a given polymorphic site has been found and characterized as a biallelic marker as 
described above, several methods can be used in order to determine the specific allele carried 
by an individual at the given polymorphic base. 



The identification of biallelic markers described previously allows the design of appropriate 
oligonucleotides, which can be used as probes and primers, to amplify a TB01 gene containing 
the polymorphic site of interest and for the detection of such polymorphisms. 

Amplification 

Most genotyping methods require the previous amplification of the DNA region carrying the 
polymorphic site of interest. Amplification can be performed using the same primers already 
detailed or alternative second primers. 

The invention also concerns alternative second DNA primers, preferably in the form of primer 
pairs characterized in that they preferably comprise more than 8 nucleotides, preferably 
between 8 and 100 nucleotides, more preferably between 8 and 50 nucleotides, further more 
preferably between 8 and 30 nucleotides and in that they are sufficiently complementary with a 
region of a TBC-I gene to hybridize therewith. In some embodiments, the primer pairjs adapted 
for amplifying a sequence containing the polymorphic base of one of the sequences of SEQ ID 
Nos 7-8. 

For amplification and sequencing, the pairs of primers are sufficiently complementary with a 
region of a TBC-1 gene, particularly of the genomic sequence of SEQ ID No 1. located at less 
than 500 pb. preferably at less than 100 pb, and more preferably at less than 50 pb of a 
polymorphic site corresponding to one of the nnarkers of the present invention. 

One of the techniques that can be applied for the amplification of a polymorphic TBC-I gene or 
fragments thereof in a sample using the second primers of the invention can be selected from 
the techniques described above for the amplification of the TBC-I gene. 

These second primers can be used, for example, for specific amplification experiments. In these 
experiments, at least one primer is sufficiently complementary with a region of a TBC-1 gene 
comprising the considered polymorphic site corresponding to one of the markers of the present 
invention to hybridize therewith that said primer is able to initiate the specific amplification of one 
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allele of the biallelic marker. In some embodiments, the biallelic markers are comprised in one 
of sequences of SEQ ID Nos 7-8. 

Sequencing 

•me amplification products generated above with the primers of the invention can be sequenced 
5 using methods known and available to the skilled technidan. Preferably, the amplified DNA is 
subjected to automated dideoxy tenninator sequencing reactions using a dye-primer cycle 
sequencing protocol. A sequence analysis can allow the identification of the base present at the 
polymorphic site. 

10 It is to be noted that amplification primers such as those of SEQ ID Nos 9-10 or 13-14 can be 
immobilized on an appropriate solid support prior to amplification. Various techniques of solid-phase 
immobilization of nucleotide sequences are disclosed below, a preferred technique is described in 
PCT application WO 96/13609 incorporated herein by reference. 

Microsequencing 

1 5 Polymorphism analyses on pools or selected individuals of a given population can be earned out 
by conducting microsequencing reactions on candidate regions comprised in -amplified 
fragments obtained by PGR perfomned on DNA or RNA samples taken from these individuals. 

To do so, DNA samples ars subjected to PGR amplification of the candidate regions under 
20 conditions similar to those described above. These amplification products are then subjected to 
automated microsequencing reactions using ddNTPs (specific fluorescence for each ddNTP) 
and appropriate oligonucleotide microsequencing primers which can hybridize just upstream of 
the polymorphic base of interest. Once specifically extended at the 3' end by a DNA polymerase 
using a complementary fluorescent dideoxynucleotide analog (themnal cycling), the primer is 
25 precipitated to remove the unincorporated fluorescent ddNTPs. The reaction products in which 
fluorescent ddNTPs have been incorporated are then analyzed by electrophoresis on ABI 377 
sequencing machines. 

Microsequencing primers hybridize upstream of the polymorphic base to be genotyped, either 
30 with the coding or with the non-coding strand. Preferably, the 3' end of the microsequencing 
primer is immediately upstream of the polymorphic base of the biallelic marker being genotyped, 
such that upon extension of the primer, the polymorphic base is the first base incorporated. The 
microsequencing primers may be oligonucleotides of 10. 15, 20 or more bases in length. 
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In a preferred embodiments, the microsequencing primers are those indicated as features within 
the sequence listings corresponding to markers of SEQ ID Nos 7-8. 



An example of a typical microsequencing procedure that can be used in the context of the 
5 present invention is provided in example 5. It is to be understood that certain parameters of this 
procedure such as the electrophoresis method or the labeling of ddNTPs could be modified by 
the skilled person without substantially modifying its result. 

As a further alternative to the process described above, several solid phase microsequencing 
10 reactions have been developed. The basic microsequencing protocol is the same as described 
previously, except that either the oligonucleotide microsequencing primers or the PCR-amplified 
products of the DNA fragment of interest are immobilized. For example, immobilization can be 
carried out via an interaction between biotinylated DNA and streptavidin-coated microtitration 
wells or avidin-coated polystyrene particles. 

15 

In such solid phase microsequencing reactions, incorporated ddNTPs can either be radiolabeled 
(see Syvanen, 1994, incorporated herein by reference) or linked to fluorescein (see Livak & 
Hainer, 1994. incorporated herein by reference). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs 
20 can be based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, 
followed by incubation with a chromogenic substrate (such as p-nitrophenyl phosphate). 

Other possible of reporter-detection couples include : 

- ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase conjugate (see Harju 
25 et al., 1993, incorporated herein by reference) 

- biotinylated ddNTP and horseradish peroxidase-conjugated streplavidin with o- \ 
phenyienediamine as a substrate (see WO 92/15712, incorporated herein by reference). 

A diagnosis kit based on fluorescein-linked ddNTP with antifluorescein antibody conjugated with 
30 alkaline phosphatase is commercialized under the name PRONTO by GamidaGen Ltd. 

As yet another alternative microsequencing procedure, Nyren et al. (1993) presented a concept 
of solid-phase DNA sequencing that relies on the detection of DNA polymerase activity by an 
enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA). The PCR-amplified 
35 products are biotinylated and immobilized on beads. The microsequencing primer is annealed 
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and four aliquots of this mixture are separately incubated with DNA polymerase and one of the 
four different ddNTPs. After the reaction, the resulting fragments are washed and used as 
substrates in a primer extension reaction with all four dNTPs present. The progress of the DNA- 
directed polymerization reactions are monitored with the ELIDA. Incorporation of a ddNTP in the 
first reaction prevents the formation of pyrophosphate during the subsequent dNTP reaction. In 
contrast, no ddNTP incorporation in the first reaction gives extensive pyrophosphate release 
during the dNTP reaction and this leads to generation of light throughout the ELIDA reactions. 
From the ELIDA results, the first base after the primer is easily deduced. 

Probes and hybridization 

The invention also relates to a group of probes characterized in that they preferably comprise 
between 8 and 50 nucleotides, and in that they are sufficiently complementary to a polymorphic 
sequence defined by a biallelic marker located in the genomic sequence of 780-7 to hybridize 
thereto and preferably sufficiently specific to be able to discriminate the targeted sequence for 
only one nucleotide variation. 

The length of these probes can range from 8. 10, 15, 20. or 30 to 100 nucleotides, preferably 
from 8 to 50, more preferably from 40 to 50 nucleotides. Particularly preferred probes range in 
lenght between 40 and 50 nucleotides, for example 47 nucleotides in length, they include a 
centrally located (for example at position 24) nucleotide complementary to a polymorphic site of 
the TBC-1 gene, preferably a polymorphic site corresponding to one of the biallelic markers of 
the present invention, such as the polymorphic base in the sequences of SEQ ID Nos 7-8, and a 
20 to 25 (for example 22) nucleotide sequence spanning on each side of the central nucleotide 
and substantially complementary to the nucleotide sequences of the TBC-I gene spanning on 
each side of the polymorphic site. 

The probes are generally labeled with a radioactive element {''P, ''S, 'H, "'l) or by a non- 
isotopic molecule (for example, biotin, acetylaminofluorene, digoxigenin. 5-bromodesoxyuridin. 
fluorescein). 

Examples of non-radioactive labeling of nucleic acid fragments are described in the French 
patent N** FR-7810975 or by Urdea et a!, (Urdea M.S., 1988, t^ucleic Acids Research, Vol. 11: 
4937-4957) or Sanchez-Pescador et al., (Sanchez-Pescador R.. 1988, J, Clin. Microbiol., Vol. 
26(10): 1934-1 938). 
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Advantageously, the probes according to the present invention may have stmctural 
characteristics such that they allow the signal amplification, such structural characteristics 
being, for example, branched DNA probes as those described by Urdea et al. in 1991 or in the 
European patent N° EP-0225,807 (Chiron). 

5 

The probes are preferably directly labeled such as with isotopes, reporter molecules or 
fluorescent labels or indirectly labeled such as with biotin to which a streptavidin complex may 
later bind. Probe labeling techniques are well-known to the skilled technician. By assaying the 
presence of the probe, one can detect the presence or absence of the targeted DNA sequence 
10 in a given sample. The same labels can be used with primers. 

The probes of the present invention are useful for a number of purposes. They can be used in 
Southern hybridization to genomic DNA or Northern hybridization to mRNA. The probes can 
also be used to detect PGR amplification products. They may also be used to detect 
15 mismatches in the TBC-1 gene or mRNA using otheT techniques. The probes are 
complementary to the TBC-1 gene coding sequences, to introns, and to regulatory sequences 
such as promoter, and most preferably to sequences comprised in the genomic sequence of 
SEQ IDN0I. 

20 Any of the primers and probes of the present invention can be conveniently immobilized on a 
solid support. Solid supports are known to those skilled in the art and include the walls of wells 
of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, 
membranes, microparticles such as latex particles, sheep {or other animal) red blood cells, 
duracytes and others. The "solid phase" is not critical and can be selected by one skilled in the 

25 art. Thus, latex particles, microparticles. magnetic or non-magnetic beads, membranes, plastic 
tubes, walls of microliter wells, glass or silicon chips, sheep (or other suitable animal's) red 
blood cells and duracytes are all suitable examples. 

Suitable methods for immobilizing nucleic acids on solid phases include ionic, hydrophobic. 
30 covalent interactions and the like. A "solid phase", as used herein, refers to any material which 
is insoluble, or can be made insoluble by a subsequent reaction. The solid phase can be chosen 
for its intrinsic ability to attract and immobilize the capture reagent. 

Alternatively, the solid phase can retain an additional receptor which has the ability to attract 
35 and immobilize the capture reagent. The additional receptor can include a charged substance 
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that is oppositely charged with respect to the capture reagent itself or to a charged substance 
conjugated to the capture reagent. 



As yet another alternative, the receptor molecule can be any specific binding member which is 
immobilized upon (attached to) the solid phase and which has the ability to immobilize the 
capture reagent through a specific binding reaction. The receptor molecule enables the indirect 
binding of the capture reagent to a solid phase material before the perfonmance of the assay or 
during the perfonnance of the assay. The solid phase thus can be a plastic, derivatized plastic, 
magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, micropartlcle, chip, sheep (or other suitable animal's) red blood cells, duracytes and other 
configurations known to those of ordinary skill in the art. 

Polymorphisms can be analyzed and the frequency of corresponding alleles quantified through 
hybridization reactions on amplified TBC-I sequences. The amplification reaction can be carried 
out as described previously. The hybridization probes which can be conveniently used in such 
reactions preferably include the probes defined above as being sufficiently complementary to a 
polymorphic site defined by one of the biatlelic markers located in the genomic sequence of 
TBC'1 to hybridize thereto and sufficiently specific to be able to discriminate the 'targeted 
sequence for only one nucleotide variation. 

The amplification reagents, detection probes and test sample are preferably placed under 
amplification conditions whereby, in the presence cf the target TSC-1 sequence, copies of the 
target TBC-1 sequence (an amplicon) are produced- The probes can also be incorporated into 
the sample after the amplification reaction has taken place. In the usual case, the amplicon is 
double stranded because primers are provided to amplify a target 760-7 sequence and its 
complementary strand. Upon formation of the single stranded amplicon members, the mixture is 
cooled to allow the formation of complexes between the probes and single stranded amplicon 
members. 

After the probe/amplicon member hybrids are formed, they are detected. Standard 
heterogeneous assay formats are suitable for detecting the hybrids using the labels present on 
the primers and probes. Preferably, the hybrids can be bound to a solid phase reagent by virtue 
of a capture label and detected by virtue of a detection label. In cases where the detection label 
is directly detectable, the presence of the hybrids on the solid phase can be detected by causing 
the label to produce a detectable signal, if necessary, and detecting the signal. In cases where 
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the label is not directly detectable, the captured hybrids can be contacted with a conjugate, 
which generally comprises a binding member attached to a directly detectable label. The 
conjugate becomes bound to the complexes and the conjugates presence on the complexes 
can be detected with the directly detectable label. Thus, the presence of the hybrids on the solid 
phase reagent can be determined. Those skilled in the art will recognize that wash steps may 
be employed to wash away unhybridized amplicon or probe as well as unbound conjugate. 

While the amplification primers initiate amplification of the target TBC-1 sequence, the detection 
(or hybridization) probe is not involved in amplification. Detection probes are generally nucleic 
acid sequences or uncharged nucleic acid analogs such as. for example peptide nucleic acids 
which are disclosed in International Patent Application WO 92/20702; morpholino analogs which 
are described in U.S. Patents Numbered 5,185.444, 5,034,506, and 5,142,047; and the like. 
Depending upon the type of label carried by the probe, the probe is employed to capture or 
detect the amplicon generated by the amplification reaction. The probe is not involved in 
amplification of the target sequence and therefore may have to be rendered "non-extendable" in 
that additional dNTPs cannot be added to the probe. In and of themselves analogs usually are 
non-extendable and nucleic acid probes can be rendered non-extendable by modifying the 3' 
end of the probe such that the hydroxyl group- is no longer capable of participating in elongation. 
For example, the 3' end of the probe can be functionalized with the capture or detection label to 
thereby consume or othenwise block the hydroxyl group. Alternatively, the 3' hydroxyl group 
simply can be cleaved, replaced or modified. U.S. Patent Application Serial No. 07/049,061 filed 
April 19, 1993 describes modifications which can be used render a probe non-extendable. 

Accordingly, the ratio of primers to probes is not important. Thus, either the probes or primers 
can be added to the reaction mixture in excess whereby the concentration of one would be 
greater than the concentration of the other. Alternatively, primers and probes can be employed 
at in equivalent concentrations. Preferably, however, the primers are added to the reaction 
mixture in excess of the probes. Thus, primer to probe ratios of, for example, 5:1 and 20:1 are 
preferred. 

Consequently, the invention also deals with a method for detecting the presence of a nucleic 
acid comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 1-4 
and 7-8 in a sample, said method comprising the following steps of : 
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a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize to a nucleotide sequence included in one of the nucleic acids of SEQ ID Nos 1-4 and 
7-8 and the sample to be assayed. 

b) detecting the hybrid complex formed between the probe or the plurality of probes and the 
nucleic acid in the sample. 

In a first preferred embodiment of the above method, said nucleic acid probe or the plurality of 
nucleic acid probes is selected from the group consisting of SEQ ID Nos 9-10 and 1 1-12. 

In a second preferred embodiment of this detection method, said nucleic acid probe or the 
plurality of nucleic acid probes are labeled with a detectable molecule. 

In a third preferred embodiment of said method, said nucleic acid probe or the plurality of 
nucleic acid probes has been immobilized on a substrate. 

The invention further concerns a kit for detecting the presence of a nucleic acid comprising a 
nucleotide sequence selected from the group consisting of SEQ ID Nos 1-4 and 11-12 in a 
sample, said kit comprising : 

a) a nucleic add probe or a plurality of nucleic acid probes which can hybridize to a nucleotide 
sequence included in one of the nucleic acids of SEQ ID Nos 1-4 and 7-8; 

b) optionally, the reagents necessary for performing the hybridization reaction. 

The nucleic acid probe or the plurality of nucleic acid probes that are included in the detection 
kit described above may be selected from the group consisting of SEQ ID Nos 9-10 and 11-12. 

In a first preferred embodiment of the detection kit, said nucleic acid probe or the plurality of 
nucleic acid probes are labeled with a detectable molecule. 

in a second prefeaed embodiment of the detection kit. said nucleic acid probe or the plurality of 
nucleic acid probes has been immobilized on a substrate. 

a) DNA chips technology and hybridization 

DNA chips result from the adaptation of computer chips to biology. They allow the integration of 
micro-biochemical processes (such as DNA hybridization), systems of signal detection (such as 
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fluorescence) and data processing into a single system which can be used to obtain information 
on polymorphism. 



Efficient access to polymorphism information is obtained through a basic structure comprising 
5 high-density arrays of oligonucleotide probes attached to a solid support (the chip) at selected 
positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes 
arranged in a grid-like pattern and miniaturized to the size of a dime. 

The immobilization of aaays of DNA probes on solid supports has been rendered possible by 
10 the development of a technology generally identified as "Very Large Scale Immobilized Polymer 
Synthesis" (VLSIPS'^'^) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 
5,143,854 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 
95/11995, the disclosure of which are incorporated herein by reference, which describe 
15 methods for forming oligonucleotide arrays through techniques such as light-directed synthesis 
techniques. 

In designing strategies aimed at providing arrays of nucleotides immobilized on solfd supports, 
further presentation strategies were developed to order and display the probe arrays on the 
20 chips in an attempt to maximize hybridization patterns and sequence information. Examples of 
such presentation strategies are disclosed in PCT Publications WO 94/12305, WO 94/11530, 
WO 97/29212 and WO 97/31256, the disclosure of which are incorporated herein by reference. 

The chip technology has already been applied with success in numerous cases. For example, 
25 the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant 
strains, and in the protease gene of HIV-1 vinjs (see Hacia et al., 1996 ; Shoemaker et al., 1996 
; Kozai et al., 1996, incorporated herein by reference). 

At least, three companies propose chips able to detect biallelic polymorphisms: Affymetrix 
30 (GeneChip), Hyseq (HyChip and HyGnostics), and Protogene Laboratories. 

One of the limitations encountered when using DNA chip technology is that hybridization of 
nucleic acids with the probes attached to the chip in arrays is not simply a solution-phase 
reaction. A possible improvement consists in using polyacrylamide gel pads isolated from one 
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another by hydrophobic regions in which the DNA probes are covalentiy linked to an acrylamide 
matrix. 

For the detection of polymorphisms, probes which contain at least a portion of one of the 
5 biailelic markers of the present invention, such as the biallelic markers of SSQ ID Nos. 7-8. are 
synthesized either in situ or by conventional synthesis and immobilized on an appropriate chip 
using methods known to the skilled technician. The solid surface of the chip is often made of 
silicon or glass but it can be a polymeric membrane. -Rius, in some embodiments, the ch.p may 
comprise an array including at least one of the sequences selected from the group consisfng of 
10 SEQ ID Nos. 7-8 and 11-12. or the sequences complementary thereto, or a fragment thereof at 
least 15 consecutive nucleotides. In some embodiments, the chip may comprise an array of at 
least 2, 3. 4. 5, 6, 7. 8 or more sequences selected from the group consisting of SEQ ID Nos. 7- 
8 and' 11-12. or the sequences complementarv thereto, or a fragment thereof at least 15 
consecutive nucleotides. 

The nucleic acid sample which includes the candidate region to be analyzed is isolated, 
amplified and labeled with a reporter group. This reporter group can be a fluorescent group such 
as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the 
chip using a fluidics station. For example, Manz et al. (1993. the disclosure of which is 
incorporated herein by reference) describe the fabrication of fluidics devices and particularly 
microcapillary devices, in silicon and glass substrates. 

After the reaction is completed, the chip is inserted into a scanner and patterns of hybridization 
are detected The hybridization data is collected, as a signal emitted from the reporter groups 
25 already incorporated into the nucleic acid, which is now bound to the probes attached to the 
Chip Probes that perfectly match a sequence of the nucleic acid sample generally produce 
stronger signals than those that have mismatches. Since the sequence and position of each 
probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given 
probe can be determined. 

30 

For single-nucleotide polymorphism analyses, sets of four oligonucleotide probes (one for each 
base type) preferably sets of two oligonucleotide probes ( one for each base type of the biallelic 
marker) are generally designed that span each posrtion of a portion of the candidate reg^n 
found in the nucleic acid sample, differing only in the identity of the polymorphic base. The 
35 relative intensity of hybridization to each series of probes at a particular location allows the 
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identification of the base corresponding to the polymorphic base of the probe. Since biallelic 
polymorphism detection involves identifying single-base mismatches on the nucleic acid 
sample, greater hybridization stringencies are required (at lower salt concentration and higher 
temperature over shorter time periods). 

5 

The use of direct electric field control improves the determination of single base mutations 
(Nanogen). A positive field increases the transport rate of negatively charged nucleic acids and 
results in a 10-fold increase of the hybridization rates. Using this technique, single base pair 
mismatches are detected in less than 15 sec (see Sosnowski et al., 1997, the disclosure of 
1 0 which is incorporated herein by reference). 

The invention is also directed to an array of nucleic acid sequences, said array comprising at 
least one of the sequences selected from the group consisting of SEQ ID Nos 7-8, 9-10 11-12 
or the sequences complementary thereto or a fragment thereof of at least 8 consecutive 
15 nucleotides thereof. 

In a preferred embodiment of the array described above, this array includes therein at least two 
of the sequences selected from the group consisting of SEQ ID Nos 7-8. 9-10 and 11-12 or the 
sequences complementary thereto or a fragment of at least 8 consecutive nucleotides thereof. 

20 

b) Integrated microsequencing and capillary electrophoresis chips 
Another technique which can be used to analyze polymorphisms includes multicomponent 
integrated systems which miniaturize and compartmentalize processes such as restriction 
enzyme, PGR, and capillary electrophoresis reactions in a single functional device. An example 
25 of such technique is disclosed in US patent 5,589,136. the disclosure of which is incorporated 
herein by reference, which concerns the ^ integration of PGR amplification and capillary 
electrophoresis in chips. 

Integrated systems can be envisaged mainly when microfluidic systems are used. These 
30 systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic 
wafer included on a microchip. The movements of the samples are controlled by electric forces 
applied across different areas of the microchip to create functional microscopic valves and 
pumps with no moving parts. Regulating or varying the voltage controls the liquid flow at 
intersections between the micro-machined channels and changes the liquid flow rate for 
35 pumping across different sections of the microchip. 
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In the case of biallelic marker analyses, the micro-chip integrates nucleic acid amplification, a 
microsequencing reaction {such as the one described above), capillary electrophoresis and a 
detection method such as laser-induced fluorescence detection. 

In a first step, the DNA samples are amplified, preferably by PGR. Then, the amplification 
products are subjected to automated microsequencing reactions using ddNTPs (specific 
fluorescence for each ddNTP) and the appropriate oligonucleotide microsequencing primers 
which hybridize just upstream of the targeted polymorphic base. Once the extension at the 3* 
end is completed, the primers are separated from the unincorporated fluorescent ddNTPs by 
capillary electrophoresis. The separation medium used in capillary electrophoresis can for 
example be polyacrylamide, polyethyleneglycol or dextran. The incorporated ddNTPs in the 
single-nucleotide primer extension products are identified by fluorescence detection. This micro- 
chip can be used to process at least 96 to 384 samples in parallel. It can use the usual four 
color laser induced fluorescence detection of the ddNTPs. 

Thus, the chips may comprise an an-ay comprising at least one of the sequences selected from 
the group consisting of SEQ ID Nos. 7-8. or the sequences complementary thereto or a 
fragment thereof of at least 8 consecutive nucleotides for determining whether a sample 
contains one or more alleles of the biallelic mariners of SEQ ID Nos. 7-8. The chips may also 
comprise an array comprising at least one of the sequences selected from the group consisting 
of SEQ ID Nos. 9-10 or the sequences complementary thereto or a fragment thereof of at least 
8 consecutive nucleotides for amplifying one or more alleles of the biallelic markers of SEQ ID 
Nos. 7-8. In other embodiments, the chips may also comprise an an-ay comprising at least one 
of the sequences selected from the group consisting of SEQ ID Nos. 9-10 or the sequences 
complementary thereto or a fragment thereof of at least 8 consecutive nucleotides for 
conducting microsequencing analyses to determine whether a sample contains one or more 
alleles of the biallelic markers of SEQ ID Nos. 7-8. In still further embodiments, the chip may 
comprises an array comprising at least one of the sequences selecting from the group 
consisting of SEQ ID Nos. 1 1-12 or the sequences complementary thereto or a fragment thereof 
of at least 8 consecutive nucleotides for detenmining whether a sample contains one or more 
alleles of the biallelic mariners of SEQ ID Nos. 7-8. 
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c) other methods 

Other methods well-known by those skilled in the art can also be used to detect bialleiic 
polymorphisms such as single strand conformational polymorphism analysis (SSCP) described 
in Orita at al. (1989), the disclosure of whicn is incorporated nerein by rerarsncs. aenavuring 
gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch cleavage detection, and 
other conventional techniques as described in Sheffield. V.C. et al. (1991), White et al. (1992), 
Grompe et al. (1989), and Grompe (1993), the disclosures of which are incorporated herein by 
reference. These methods are well known to those skilled in the art. 

The bialleiic markers according to the present invention may be used in methods for the 
identification and characterization of an association between alleles for one or several bialleiic 
markers of the sequence of the TBC-1 gene and a trait. 

The identified polymorphisms, and consequently the bialleiic markers of the invention, may be used 
in methods for the detection in an individual of TB01 alleles associated with a trait, more 
particularly a trait related to a cell differentiation or abnormal cell proliferation disorders, and most 
particularly a trait related to cancer diseases, specifically prostate cancer. 



D D- ASSOCIATION STUDIES 

As mentioned before, the identification of genes involved in polygenic traits such as prostate 
cancer susceptibility or response to treatment against prostate cancer, can be carried out by 
performing association analyses. As it will be described below, the purpose of association 
studies is to compare the frequencies of bialleiic markers in trait positive and trait negative 
5 populations. If a trait-causing allele (TCA) is associated to a trait T. then the frequency of this 
particular TCA as well as that of all alleles in LD with that particular TCA, will be significantly 
higher in trait + than in trait - individuals. 

The general strategy to perform association studies using bialleiic markers derived from a 
30 candidate gene is to scan two groups of individuals (trait * and trail - control individuals which 
are characterized by a well defined phenotype as described below) in order to measure and 
statistically compare the allele frequencies of such bialleiic markers in both groups. 

If a statistically significant association with a trait is identified for at least one or more of the 
35 analyzed bialleiic markers, one can assume that : either the associated allele is directly 
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responsible for causing the trait (associated allele is the TCA), or the associated allele is in LD 
with the TCA, The specific characteristics of the associated allele with respect to the candidate 
gene function usually gives further insight into the relationship between the associated allele 
and the trait (causal or in LD). If the evidence indicates that the associated allele within the 
5 candidate gene is most probably not the TCA but is in LD with the real TCA, then the TCA can 
be found by sequencing the vicinity of the associated marker. 

Collection of DNA samples from trait positive (trait +) and trait negative (trait - 
individuals (inclusion criteria) 

In order to perfonn efficient and significant association studies such as those described herein, 
10 the trait under study should preferably follow a bimodal distribution in the population under 
study, presenting two clear non-overlapping phenotypes, trait + and trait 

Nevertheless, even in the absence of such a bimodal distribution (as may in fact be the case for 
more complex genetic traits), any genetic trait may still be analyzed by the association method 
15 proposed here by carefully selecting the individuals to be included in the trait + and trait - 
phenotypic groups. The selection procedure involves to select individuals at opposite ends of 
the non-bimodal phenotype spectra of the trait under study, so as to include in these trait + and 
trait - populations individuals which clearly represent extreme, preferably non-overtapping 
phenotypes. 



20 



30 



The definition of the inclusion aiteria for the trait + and trait - populations is an important aspect 
of the present invention. The selection of drastically different but relatively uniform phenotypes 
enables efficient comparisons in association studies and the possible detection of marked 
differences at the genetic level, provided that the sample sizes of the populations under study 
are significant enough. 

Generally, trait + and trait - populations to be included in association studies such as proposed 
in the present invention consist of phenotypically homogenous populations of individuals each 
representing 100% of the corresponding trait if the trait distribution is bimodal. 

A first group of between 50 and 300 trait + individuals, preferably about 100 individuals, are 
recruited according to clinical inclusion criteria based on either 1") affection by prostate cancer, 
2°) evidence of aggressiveness of prostate cancer tumors. 
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In each case, a similar number of trait - individuals, preferably more than 100 individuals, are 
included in such studies \Nho are preferably both ethnically- and age-matched to the trait 
positive cases. They are checked for the absence of the clinical criteria defined above. Both trait 
and trait - individuals should correspond to unre.ated casas. 

5 

In one of the preferred embodiments of the present invention, association studies are carried 
out on the basis of a presence (trait+) or absence (trait-) of prostate cancer. Trait- group of 
individuals is described in details in Example 6. Several groups of trait+ individuals were used : 
(1) independent individuals consisting of both patients diagnosed before the age of 65 years old 

10 and patients diagnosed after the age of 65 years old, these two later groups being also tested 
separately; (2) groups of unrelated individuals consisting of prostate cancer sporadic cases and 
(3) groups of unrelated individuals consisting of familial prostate cancer cases consisting of both 
patients diagnosed before the age of 65 years old and patients diagnosed after the age of 65 
years old, these two later groups being also tested separately. A specific protocol for the 

15 collection of DNA samples from trait + and trait - individuals is described in Example 6. 

In order to have as much certainty as possible on the absence of prostate cancer in trait - 
individuals, it is preferred to conduct a PSA dosage analysis on this population. Several 
commercial assays can be used (WO 96/21042, herein by reference). In one preferred 
20 embodiment, a Hybritech assay is used and trait - individuals must have a level of PSA less 
than 2.8 ng/ml of serum in order to be selected as such. In a preferred embodiment, the Yang 
assay is used and trait - individuals must have a level of PSA of less than 4 ng/ml of serum in 
order to be included in the population under study. 

Genotyping of trait + and trait - individuals 

25 Allelic frequencies of the bialielic markers in each of the above described population can be 
determined using one of the methods described above under the heading "Genotyping of 
bialielic markers". Analyses are preferably performed on amplified fragments obtained by 
genomic PGR performed on the DNA samples from each individual in similar conditions as 
those described above for the generation of bialielic markers. 

30 

In a preferred embodiment, amplified DNA samples are subjected to automated 
microsequencing reactions using fluorescent ddNTPs (specific fluorescence for each ddNTP) 
and the appropriate microsequencing oligonucleotides which hybridize just upstream of the 
polymorphic base. 
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Genotyping is further described in Example 7. 



Association studies 

In one prefen-ed embodiment of the invention, a correlation was found between the 99-430-352 
5 biatlelic marker of the TBC-1 gene and prostate cancer, results of the association study are 
further described in details in example 8. 

Similar association studies can also be earned out with other biallelic markers within the scope 
of the invention, preferably with biallelic markers in LD with the markers associated with prostate 
10 cancer as described above, including the biallelic markers of SEQ ID Nos 7-8. 

Similar associations studies can be earned out by the skilled technician using the biallelic 
mart<ers of the invention defined above, with different trait + and trait - populations. Suitable 
further examples of association studies using biallelic markers of the TBC-1 gene, including the 
1 5 biallelic markers of SEQ ID Nos 7-8, involve studies on the following populations: 

- a trait + population suffering from a cancer and a healthy unaffected population, or , 

- a trait + population suffering from prostate cancer treated with agents acting against prostate 
cancer and suffering from side-effects resulting from this treatment and an trait - population 
suffering from prostate cancer treated with same agents without any substantial side-effects, or 

20 - a trait + population suffering from prostate cancer treated with agents acting against prostate 
cancer showing a beneficial response and a trait - population suffering from prostate cancer 
treated with same agents without any beneficial response, or 

- a trait + population suffering from prostate cancer presenting highly aggressive prostate 
cancer tumors and a trait - population suffering from prostate cancer with prostate cancer 

25 tumors devoid of aggressiveness. 

E- IDENTIFICATION OF A TRAIT CAUSING MUTATION IN THE 7SC-1 GENE 

Mutations in the TBC-1 gene which are responsible for a detectable phenotype may be 
identified by comparing the sequences of the TBC-I genes from trait + and trait - individuals. 
30 Preferably, trait + individuals to be sequenced canry a single marker allele or a haplotype shown 
to be associated to the trait and trait - individuals to be sequenced do not carry such allele or 
haplotype associated to the trait. The detectable phenotype may comprise prostate cancer, a 
response to or side effects related to a prophylactic or curative agent acting against prostate 
cancer, the aggressiveness of prostate cancer tumors, expression of the TBC-1 gene, a 
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modified or forthcoming production of the TBC-1 protein, or the production of a modified TBC-1 
protein. The mutations may comprise point mutations, deletions, or insertions in the TBC-I 
gene. The mutations may lie within the coding sequence for the TBC-1 protein or within 
regulatory regions in the TBC-1 gene, such as its promoter. 

The method used to detect such mutations generally comprises the following steps : 

- amplification of a region of the TBC-I gene comprising a biallelic mart<er or a group of 
biallelic markers associated to the considered trait from DNA samples of trait + patients 
and trait - controls; 

- sequencing of the amplified region; 

- comparison of DNA sequences from trait + patients and trait - controls; and 

- determination of mutations specific to trait + patients. 

Oligonucleotide primers are constructed as described previously to amplify the sequences of 
each of the exons, introns, the promoter region and the regulatory regions of the TBC-I gene. 
The oligonucleotide primers may comprise at least 10. preferably 15, more preferably 20, and 
even more preferably 25 or more consecutive nucleotides of the TBC-1 genomic DNA or the 
TBC'i cDNA or the sequences complementary thereto. 

Each primer pair is used to amplify the exon, promoter region, introns or regulatory regions, 
from which it is derived. Amplification is carried out on genomic DNA samples from trait + 
patients and trait - controls, preferably using the PGR conditions described in the examples. 
Amplification products from the genomic PCRs are then subjected to sequencing, preferably 
through automated dideoxy terminator sequencing reactions and electrophoresed, preferably on 
ABI 377 sequencers. Following gel image analysis and DNA sequence extraction, ABI 
sequence data are automatically analyzed to detect tHe presence of sequence variations among 
trait + and trait - individuals. Sequences are verified by determining the sequences of both DNA 
strands for each individual. 

Candidate polymorphisms suspected of being responsible for the detectable phenotype, are 
then verified by screening a larger population of trait + and trait - individuals using polymorphism 
analysis techniques such as the techniques described above. Polymorphisms which exhibit a 
statistically significant con-elation with the detectable phenotype are deemed responsible for the 
detectable phenotype. 
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In another embodiment, the mutant TBC-1 allele which causes a detectable phenotype can be 
isolated by obtaining a nucleic acid sample such as a genomic library or a cDNA library from an 
individual expressing the detectable phenotype. The nucleic acid sample can be contacted with 
one or more probes lying in the region of the TBC-1 gene where the associated biallelic marker 
or group of biallelic markers or with PCR-typeable primers specific to the amplificalion of this 
biallelic marker or group of biallelic markers. The mutation can be identified by conducting 
sequencing reactions on the nucleic acids which hybridize with the probes defined herein or 
which show amplification by PGR. 

The region of the TBC-1 gene containing the mutation responsible for the detectable phenotype 
may be used in diagnostic techniques such as those described below. For example, 
microsequencing oligonucleotides, or oligonucleotides containing the mutation responsible for 
:he deteciabie phenotype for amplification, or hybridization based diagnostics, such as those 
described herein, may be used for detecting individuals suffering from the detectable phenotype 
or individuals at risk of developing the detectable phenotype at a subsequent time. In addition, 
the TBC-1 allele responsible for the detectable phenotype may be used in gene therapy. The 
rSC-f allele responsible for the detectable phenotype may also be cloned into an expression 
vector to express the mutant TBC-1 protein as described herein. 

F- DETECTION OF MARKERS OR GROUPS OF MARKERS ASSOCIATED WITH A TRAIT 

The invention concerns a diagnostic method for the detection in an individual of markers or 
groups of markers associated with a trait which may include prostate cancer susceptibility, an 
early onset of prostate cancer, or the aggressiveness of prostate cancer tumors. The 
information obtained using this method is useful in the diagnosis, staging, monitoring, prognosis 
and/or prophylactic or curative therapy of prostate cancer. The method also concerns the 
detection of specific alleles present within a gene expressing a modified level of TBC-1 mRNA 
or an altered T8C-1 mRNA. coding for an altered TBC-1 protein. More particularly, the invention 
concerns the detection of a TBC-1 nucleic acid comprising at least one of the nucleotide 
sequences of SEQ ID Nos 7-8. 

This method comprises the following steps : 

- obtaining a nucleic acid sample from the individual to be tested, and 

- determining the presence in the sample of an allele of a biallelic marker or of a group of 
biallelic markers of the TBC-1 gene which, when taken alone or in combination with 
another/other biallelic marker/s of the TBC-1 gene, is indicative of prostate cancer, of an early 
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onset of prostate cancer, of the level of aggressiveness of prostate cancer tumors, of a modified 
or forthcoming expression of the TBC-1 gene, of a modified or forthcoming production of the 
TBC-1 protein, or of the production of a modified TBC-1 protein. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
5 Nos 7-8, or a fragment thereof including the polymorphic base. 

More particularly, the detection method of the present invention comprises the following steps: 
obtaining a nucleic acid sample from the individual to be tested, 

- amplifying a nucleotide sequence of the TBC-1 gene contained in the sample, and 

10 - detecting the presence in the sample of an allele of a biallelic marker or of a group of biallelic 
markers of the TBC-I gene which, when taken alone or in combination with another/other 
biallelic marker/s of the TBC-I gene, is indicative of prostate cancer of an early onset of 
prostate cancer, of the level of aggressiveness of prostate cancer tumors, of a modified or 
forthcoming expression of the TBC-I gene, of a modified or forthcoming production of the TBC- 

1 5 1 protein, or of the production of a modified TBC-1 protein. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8, In preferred embodiments of the two detection methods described above, the presence 
of alleles of one or more biallelic mariners of the rBC-7 gene is determine through 
mlcrosequencing reactions using microsequencing primers such as those of SEQ ID Nos 11-12. 

20 More particulariy, it is preferred that the microsequencing primers be bound to a solid support, 
preferably in the fomri of arrays of primers attached to appropriate chips or be used in 
microfluidic devices. Such arrays are described in further detail in the next section. 

Alternatively, the detection method of the present invention can comprise the following steps: 

25 - obtaining a nucleic acid sample from the individual to be tested, 
i - specifically amplifying a nucleotide sequence of the TBC-I gene comprising an allele of 
biallelic marker or of a group of biallelic markers of a TSC-7 gene which, when taken alone or in 
combination with another/other biallelic marker/s of a TBOI gene, is indicative of prostate 
cancer, of an early onset of prostate cancer, of the level of aggressiveness of prostate cancer 

30 tumors, of a modified or forthcoming expression of the TB01 gene, of a modified or forthcoming 
production of the TBC-1 protein, or of the production of a modified TBC-1 protein, and 

- detecting the presence of an amplification product. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

35 
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In a further embodiment of the present invention, another detection method comprises the 
following steps: 

- contacting a hybridization probe in accordance with the invention with a biological sample, the 
DNA of which can optionally be pretreated to be rendered accessible to hybridization, under 

5 conditions which allow hybridization of the probe to the nucleotide sequence of the TBOI gene 
comprising an allele of a biallelic marker of the present invention, and 

" revealing the formation of a hybrid comprising the probe and a nucleotide sequence from the 
biological sample. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
10 Nos7-8. 

Preferably, the formation of a hybrid is revealed through the detection of a signal from a label 
attached to the probe. More preferably, the signal is amplified prior to being revealed. 

15 In yet another embodiment of the present invention, a further detection method comprises the 
following steps: 

- contacting a hybridization probe in accordance with the invention, immobilized on a solid 
support, with a biological sample, the DNA of which can optionally be pretreated to be rendered 
accessible to hybridization, under conditions which allow hybridization of the probe to the 

20 nucleotide sequence of the TBC-1 gene comprising an allele of a biallelic marker of the present 
invention, 

- contacting the hybrid thus formed, optionally after having removed the nucleotide sequences 
which did not hybridize with the probe, with a labeled hybridization probe, and 

- revealing the formation of a hybrid comprising the hybridization probe, a nucleotide sequence 
25 from the biological sample and the labeled probe. 

In lome embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

Preferably, the nucleotide sequences of the biological sample are amplified prior to hybridization 
30 using one of the primers described herewith. 

The invention also specifically relates to a method of determining whether an individual suffering 
from prostate cancer or susceptible of developing prostate cancer is likely to respond positively 
to treatment with a selected medicament acting against prostate cancer. 

35 
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The method comprises the following steps: 

- obtaining a DNA sample from the individual to be tested, and 

- analyzing said DNA sample to detennine whether it comprises alleles of one or more biallelic 
markers associated with a positive response to treatment with the medicament and/or alleles of 

5 one or more •biallelic markers associated with a negative response to treatment with the 
medicament. 

The detection methods of the present invention can be applied to, for example, the preliminary 
screening of patient populations suffering from prostate cancer. This preliminary screening is 

10 useful to initiate adequate treatment when needed or to determine and select appropriate 
patient populations for clinical trials on new compounds in order to avoid the potential 
occurrence of specific side effects or to enhance the probability of beneficial patient response. 
By establishing in advance a homogeneous genotype selection for the population to be tested, 
the assessment of drug efficacy and/or toxicity can be more readily achieved and less 

15 hampered by divergences in population response. This approach can yield better therapeutic 
approaches based on patient population targeting resulting from pharmacogenomics studies. 

Further details on the two main process steps of this method are provided below. 
Nucleic acid samples 

20 A nucleic acid sample is obtained from an individual to be tested for susceptibility to one of the 
traits referred to above. The nucleic acid sample may be an RNA sample or a DNA sample, but 
is preferably genomic DNA, Any human-derived sample containing the nucleic acid of interest 
can be employed without any particular limitation. As examples, there may be mentioned body 
fluids such as blood, bone marrow fluid, semen and peritoneal fluid, prostate and other tissue 

25 cells. 

Appropriate genomic DNA can be prepared by extracting and purifying from these samples by 
conventional methods. Concentrated test samples can also be obtained in large quantities by 
amplifying, based on the genomic DNA, a region susceptible of containing a biallelic marker or a 
30 group of biallelic markers as described above, or biallelic markers in LD therewith, responsible 
for the given trait. This is carried out, for example by the PGR technique, employing primers 
adequately designed so that the desired region is amplified. 
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Preferably, the nucleic acid sample subject to analysis in this method is genomic and in such a 
case, an amplifying step may be applied to the nucleic acid sample, primed from a fonArard 
primer sufficiently compiementar/ with a first subsequence of the antisense strand of the TBC-1 
gene to hybridize therewith, and a reverse primer sufficiently complementary to a second 
sequence of the sense strand of the TBC-1 gene to hybridize therewith. Amplification products 
generated from these primers are analyzed using methods known to the skilled technician and 
which are described in further detail in the next section. 

More particularly, it is desirable to amplify genomic DNA regions susceptible of harboring a 
biallelic marker such as that disclosed in example 4. 

The preferred primers that can be used to amplify such regions are disclosed in example 3. 

Detection of biallelic markers in the amplified nucleic acid samples 

Once amplification of the DNA has been earned out or once the appropriate DNA sample has 
been obtained if no amplification takes place, the identities of the polymorphic bases of one or 
more of the biallelic markers described above or biallelic markers in with the sequences of such 
markers, or a combination thereof, are determined. The identities of the polymorphic bases may 
be determined using either any of the genotyping procedures described above in "Genotyping of 
biallelic markers". 

For example, microsequencing with any primer having a 3' and .near the polymorphic -nucleotide, 
and preferably immediately adjacent to the polymorphic nucleotide, may be used. Alternatively, 
the PGR product may be completely sequenced to determine the identities of the polymorphic 
bases in the biallelic markers. In another method, the identities of the polymorphic bases in the 
biallelic markers is determined by hybridizing the amplification products to allele specific 
oligonucleotides specific for the polymorphic bases in the biallelic markers, such as the probes 
described previously. 

More preferably, the identities of the polymorphic bases can be determined with "large scale" 
genotyping procedures, such as those involving the hybridization with DNA chips or integrated 
microsequencing systems, which are described above. An example of the microsequencing 
analysis altemative on oligonucleotide arrays is described in Pastinen et al. (1997), the 
disclosure of which is incorporated herein by reference. In another example, microarrays (DNA 
chips) comprising allele specific oligonucleotides can be used and are described above. 
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It will be appreciated that the identities of the polymorphic bases in the biallelic markers may be 

determined using techniques other than those listed above, such as conventional dot blot 
analyses. 

5 

The invention also relates to diagnostic kits useful for determining the presence in a DNA 
sample of alleles associated with a prostate cancer susceptibility, or the aggressiveness of 
prostate cancer tumors, or more generally of a modified TBC-1 production and/or metabolism, 
preferably of a production of an altered TBC-1 protein. 

10 

In a first embodiment, the kit comprises primers such as those described above, preferably 
lOPA/ard and reverse primers which are used to amplify the TBC-1 gene, particularly the genomic 
sequence of SEQ ID No1, or a fragment thereof. In some embodiments, at least one of the 
primers is complementary to a nucleotide sequence of the TBC-I gene comprising a biallelic 

15 marker associated with prostate cancer, with an early onset of prostate cancer, with the level of 
aggressiveness of prostate cancer tumors, with a modified or forthcoming expression of the 
TBC'1 gene, with a modified or forthcoming production of the TBC-1 protein, or with the 
production of a modified TBC-1 protein. In one embodiment, the biallelic marker com'prises one 
of the sequences of SEQ ID Nos 7-8. In a preferred embodiment, the kit comprises one or more 

20 of the sequences of SEQ ID Nos 9-10. 

In a second embodiment, the kit comprises micrcsequencing primers, preferably these of SEQ 
ID Nos 11-12. 

25 In a third embodiment, the kit comprises primers for the amplification of the first or second TSC- 
1 cDNAs, such as the sequences of SEQ ID No 13 and l4. 

In a fourth embodiment, the kit comprises a primer which is complementary to any nucleotide 
sequence of the TBOI gene, particulariy of the genomic sequence of SEQ ID No 1, and is used 
30 to amplify the TBC-1 gene or a fragment thereof contained in the nucleic acid sample to be 
tested which includes a polymorphic base of at least one biallelic marker. Preferably, the 
amplified region includes a polymorphic base of at least one biallelic marker selected from the 
group consisting of SEQ ID Nos 7-8. In some embodiments, the primer comprises one of the 
sequences of SEQ ID Nos 9-10. 

35 
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In a fifth embodiment, the kit comprises a DNA probe, that is or eventually becomes immobilized 
on a solid support, which is capable of hybridizing with the TBC-I gene, preferably with a region 
of the TBC-1 gene which comprises a bialieiic marker of the present invention. The techniques 
for immobilizing a nucieotide primer or probe on a solid support are well-known to the skilled 
5 person and include, but are not limited to, the immobilization techniques described in the 
present application. 

The kits of the present invention can aiso comprise optional elements including appropriate 
amplification reagents such as DNA polymerases when the kit comprises primers, reagents 
useful in hybridization reactions and reagents useful to reveal the presence of a hybridization 
reaction between a labeled hybridization probe and the TBC-1 gene containing at least one 
bialieiic marker. In one embodiment, the bialieiic marker comprises one of the sequences of 
SEQ ID Nos 7-8. 

G- TREATMENT OF PROSTATE CANCER 

As the metastasis of prostate cancer can be fatal, it is important to detect prostate cancer 
susceptibility of individuals. Consequently, the invention also concerns a method for the 
treatment of prostate cancer comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a bialieiic marker or of a group of 
bialieiic markers, preferably markers of the TBC-1 gene, associated with prostate cancer; 

- following up said individual for the appearance (and optionally the development) of tumors in 
prostate; and 

- administering a treatment acting against prostate cancer to said individual at an appropriate 
stage of the prostate cancer. 

In some embodiments, the bialieiic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

Another embodiment of the present invention consists of a method for the treatment of prostate 
cancer comprising the following steps: 
30 - selecting an individual whose DNA comprises alleles of a bialieiic marker or of a group of 
bialieiic markers, preferably markers of the TBC-1 gene, associated with prostate cancer; 

- administering a preventive treatment of prostate cancer such as 4HPR to said individual. 

In some embodiments, the bialieiic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 
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In a further embodiment, the present invention concerns a method for the treatment of prostate 
cancer comprising the following steps: 

- selecting an individual whose DNA comprises alleles of a biallelic marker or of a group of 
biallelic markers, preferably markers of the TBC-1 gene, associated with prostate cancer, 

5 - administering a preventive treatment of prostate cancer such as 4HPR to said individual; 

- following up said individual for the appearance and the development of tumors in prostate; and 
optionally 

- administering a treatment acting against prostate cancer to said individual at the appropriate 
stage of the prostate cancer. 

10 In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

To enlighten the choice of the appropriate beginning of the treatment of prostate cancer, the 
present invention also concerns a method for the treatment of prostate cancer comprising the 
15 following steps: 

- selecting an individual suffering from a prostate cancer whose DNA comprises alleles of a 
biallelic marker or of a group of biallelic markers, preferably markers of the TBC>1 gene, 
associated with the aggressiveness of prostate cancer tumors; and 

- administering a treatment acting against prostate cancer to said individual. 

20 In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
Nos 7-8. 

The invention also concerns a method for the treatment of prostate cancer in a selected 
population of individuals. The method comprises : 

25 -I selecting an individual suffering from prostate cancer and whose DNA comprises alleles of a 

I 

biallelic marker or of a group of biallelic markers, preferably markers of the TBC-I gene, 
associated with a positive response to treatment with an effective amount of a medicament 
acting against prostate cancer, 

- and/or whose DNA does not comprise alleles of a biallelic marker or of a group of biallelic 
30 markers, preferably markers of the TBC-1 gene, associated with a negative response to 

treatment with said medicament; and 

- administering at suitable intervals an effective amount of said medicament to said selected 
individual. 

In some embodiments, the biallelic marker comprises at least one of the sequences of SEQ ID 
35 Nos 7-8, 
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In the context of the present invention, a "positive response" to a medicament can be defined as 
comprising a reduction of the symptoms related to the disease, an increase of survival time or 
condition to be treated. 

In the context of the present invention, a "negative response" to a medicament can be defined 
as comprising either a lack of positive response to the medicament which does not lead to a 
symptom reduction or an increase of sun/lval time, or which leads to a side-effect obsen/ed 
following administration of the medicament. 

The invention also relates to a method of detemnining whether a subject is likely to respond 
positively to treatment with a medicament. 

The method comprises identifying a first population of individuals who respond positively to said 
medicament and a second population of individuals who respond negatively to said 
medicament. One or more biallelic markers is identified in the first population which is 
associated with a positive response to said medicament or one or more biallelic markers is 
identified in the second population which is associated with a negative response to said 
medicament. The biallelic markers may be identified using the techniques described herein. 

A DNA sample is then obtained from the subject to be tested. The DNA sample is analyzed to 
determine whether it comprises alleles of one or more biallelic markers associated with a 
positive response to treatment with the medicament and/or alleles of one or more biallelic 
markers associated with a negative response to treatment with the medicament. 

J 

In some embodiments, the medicament may be administered to the subject in a clinical tnal if 
the DNA sample contains alleles of one or more biallelic markers associated with a positive 
response to treatment with the medicament and/or if the DNA sample lacks alleles of one or 
more biallelic markers associated with a negative response to treatment with the medicament. 
In preferred embodiments, the medicament is a drug acting against prostate cancer. In other 
embodiments, the biallelic marker has a sequence selected from the group consisting of SEQ ID 
Nos 7-8. 

Using the method of the present invention, the evaluation of drug efficacy may be conducted in 
a population of individuals likely to respond favorably to the medicament. 
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Another aspect of the invention is a method of using a medicament comprising obtaining a DNA 
sample from a subject, determining whether the DNA sample contains alleles of one or more 
biallelic markers associated with a positive response to the medicament and/or whether the 
5 DNA sample contains alleles of one or more biallelic markers associated with a negative 
response to the medicament, and administering the medicament to the subject if the DNA 
sample contains alleles of one or more biallelic markers associated with a positive response to 
the medicament and/or if the DNA sample lacks alleles of one or more biallelic markers 
associated with a negative response to the medicament. 

10 

The invention also concems a method for the clinical testing of a medicament, preferably a 
medicament acting against prostate cancer The method comprises the following steps: 

- administering a medicament, preferably a medicament susceptible of acting against prostate 
cancer to a heterogeneous population of individuals, 

15 - identifying a first population of individuals who respond positiveiy to said medicament and a 
second population of individuals who respond negatively to said medicament, 
identifying biallelic markers in said first population which are associated with a positive response 
to said medicament, 

- selecting individuals whose DNA comprises biallelic markers associated with a positive 
20 response to said medicament, and 

- administering said medicament to said individuals. 

Such methods are deemed to be extremely useful to increase the benefit/risk ratio resulting 
from the administration of medicaments which may cause undesirable side effects and/or be 
25 inefficacious to a portion of the patient population to which it is normally administered. 

Once an individual has been diagnosed as suffering from a prostate cancer, selection tests are 
carried out to determine whether the DNA of this individual comprises alleles of a biallelic 
marker or of a group of biallelic markers associated with a positive response to treatment or with 
30 a negative response to treatment which may include either side effects or unresponsiveness. 

The selection of the patient to be treated using the method of the present invention can be 
carried out through the detection methods described above. The individuals which are to be 
selected are preferably those whose DNA does not comprise alleles of a biallelic marker or of a 
35 group of biallelic markers associated with a negative response to treatment. The knowledge of 
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an individual's genetic predisposition to unresponsiveness or side effects to particular 
medicaments allows the clinician to direct treatment toward appropriate drugs against prostate 
cancer. 

Once the patient's genetic predispositions have been determined, the clinician can select 
appropriate treatment for which negative response, particularly side effects, has not been 
reported or has been reported only marginally for the patient. 

H. EXPRESSION OF A TBC-1 REGULATORY OR CODING POLYNUCLEOTIDE OF THE 
INVENTION. 

Any of the regulatory polynucleotides or the coding polynucleotides of the invention may be 

inserted into recombinant vectors for expression in a recombinant host cell or a recombinant host 
organism. 

Thus, the present invention also encompasses a family of recombinant vectors that contains either 
a regulatory poiynucleotide selected from the group consisting of any one of the regulatory 
polynucleotides derived from the TBC-1 genomic sequence of SEQ ID No 1, or a . polynucleotide 
comprising the TBC-I coding sequence, or both. 

In a first prefen-ed embodiment, a recombinant vector of the invention is used as an expression 
vector : (a) the TBC-1 regulatory sequence comprised therein drives the expression of a coding 
polynucleotide operably linked thereto; (b) the TBC-I coding sequence is operably linked to 
regulation sequences allowing its expression in a suitable cell host and/or host organism. 

In a second prefeaed embodiment, a recombinant vector of the invention is used to amplify the 
inserted polynucleotide, derived from the TBC-I genomic sequence of SEQ ID No 1 or TBO-1 
cDNAs in a suitable cell host . this polynucleotide being amplified at every time that the 
recombinant vector replicates. 

More particularly, the present invention relates to expression vectors which include nucleic acids 
encoding a TBC-1 protein, preferably the TBC-1 protein of the amino acid sequence of SEQ ID No 
5 described therein, under the control of a regulatory sequence selected among the TBC-I 
regulatory polynucleotides, or altematively under the control of an exogenous regulatory sequence. 
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A recombinant expression vector comprising a nucleic acid selected from the group consisting of 
SEQ ID No 2, or biologically active fragments or variants thereof, is also part of the present 
invention. 

5 The invention also encompasses a recombinant expression vector comprising : 

a) a nucleic acid comprising a regulatory polynucleotide of the nucleotide sequence SEQ ID No 
2, or a biologically active fragment or variant thereof; 

b) a polynucleotide encoding a polypeptide or a polynucleotide of interest operabty linked with 
said nucleic acid, 

10 c) optionally, a nucleic acid comprising a 3'-regulatory polynucleotide, preferably a 3'-regu!atory 
polynucleotide of the invention, or a biologically active fragment or variant thereof. 



-J The nucleic acid comprising the nucleotide sequence of SEQ ID No 2 or a biologically active 

;r fragment or variant thereof may also comprises the 5'-UTR sequence located between the 

15 nucleotide at position 1 and the nucleotide at position 170 of SEQ ID No 3, or a biologically 

i' active fragment or variant thereof. 

*^ The nucleic acid comprising the nucleotide sequence of SEQ ID No 2 or a biologically active 

O. fragment or variant thereof may also comprises the 5*-UTR sequence located between the 

20 nucleotide at position 1 and the nucleotide at position 175 of SEQ ID No 4, or a biologically 

''-'■i active fragment or variant thereof. 

The invention also pertains to a recombinant expression vector useful for the expression of the 
25 TBC'1 coding sequence, wherein said vector comprises a nucleic acid selected from the group 
consisting of SEQ ID Nos 3 and 4 or a nucleic acid having at least 95% nucleotide identity with 
a polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 
3 and 4. 



30 Another recombinant expression vector of the invention consists in a recombinant vector 
comprising a nucleic acid comprising the nucleotide sequence beginning at the nucleotide in 
position 176 and ending in position 3733 of the polynucleotide of SEQ ID No 4. 
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Some of the elements which can be found in the vectors of the present invention are described 
in further detail in the following sections, 

a; Vectors 

A recombinant vector according to the invention comprises, but is not limited to. a YAC (Yeast 
5 Artificial Chromosome), a BAG (Bacterial Artificial Chromosome), a phage, a phagemid. a cosmid. 
a plasmid or even a linear DNA molecule which may consist of a chromosomal, non-chromosomal 
and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit comprising an 
assembly of : 

(1) a genetic element or elements having a regulatory role in gene expression, for example 
10 promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 10 to 

300 bp in length that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually translated 
into a polypeptide, and 

(3) appropriate transcription initiation and temiination sequences. Structural units intended for 
15 use in yeast or eukaryotic expression systems preferably include a leader sequence enabling 

extracellular secretion of translated protein by a host cell, Altematively. where a recombinant 
protein is expressed without a leader or transport sequence, it may include an N-temninal 
residue. This residue may or may not be subsequently cleaved from the expressed recombinant 
protein to provide a final product, 

20 

Generally, recombinant expression vectors will include origins of replication, selectable markers 
permitting transfomnation of the host cell, and a promoter derived from a highly expressed gene to 
direct transcription of a downstream stnjctural sequence. The heterologous structural sequence is 
assembled in appropriate phase with translation initiation and te|mination sequences, and 
25 preferably a leader sequence capable of directing secretion of the translated protein into the 
periplasmic space or the extracellular medium. 

The selectable marker genes for selection of transformed host cells are preferably dihydrofolate 
reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or tetracycline, 
30 rifampicin or ampicillin resistance in £ co//, or levan saccharase for mycobacteria. 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
comprise a selectable mariner and a bacterial origin of replication derived from commercially 
available plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial 
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vectors include, .or example, pKK223-3 (Pharmacia, Uppsala. Sweden), and GEM1 (Promega 
Biotec, Madison, WI, USA). 

Large numbers of suitable vectors and promoters are known to those of skill in the art, and 
commercially available, such as bacterial vectors : pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, 
phagescript, psiX174. pbluescript SK, pbsks, pNH8A, pNH16A. pNHISA. pNH46A (Stratagene); 
ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Phamiacia); or eukaryotic vectors : pWLNEO, 
PSV2CAT, pOG44, pXT1, pSG (Stratagene); pSVK3. pBPV, pMSG, pSVL (Pharmacia); 
baculovirus transfer vector pVL1 392/1 393 (Pharmingen); pQE-30 (QIAexpress). 

A suitable vector for the expression of the TBC-1 polypeptide of SEQ ID No 5 is a baculovirus 
vector that can be propagated in insect cells and in insect cell lines. A specific suitable host 
vector system is the pVLI 392/1 393 baculovirus transfer vector (Pharmingen) that is used to 
transfect the SF9 cell line (ATCC N°CRL 1711) which is derived from Spodoptera frugiperda. 

Other suitable vectors for the expression of the TBC-1 polypeptide of SEQ ID No 5 in a 
baculovirus expression system include those described by Chai et al. (1993). Vlasak et al. 
(1983) and Lenhard et al. (1996). 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter and 
enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and 
acceptor sites, transcriptional tenmination sequences, and 5' flanking nontranscribed sequences. 
DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, 
enhancer, splice and polyadenylation sites may be used to provide the required nontranscribed 
genetic elements. 

b) Promoters 

The suitable promoter regions used in the expression vectors according to the present invention 
are chosen taking into account the cell host in which the heterologous gene has to be 
expressed. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it controls 
the expression or alternatively can be endogenous to the native polynucleotide containing the 
coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
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respect to the recombinant vector sequences within which the construct promoter/coding 
sequence has been inserted. 

Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA polymerase 
5 promoters, the polyhedrin promoter, or the p10 protein promoter from baculovlrus (Kit Novagen) 
(Smith et aL, 1983; O'Reilly et al., 1992), the lambda Pr promoter or also the trc promoter 

Promoter regions can be selected from any desired gene using, for example, CAT 
(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 
10 Particularly preferred bacterial promoters include lad, lacZ, T3, T7, gpt. lambda PR, PL and trp. 
Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and 
promoter is well within the level of ordinary skill in the art, 

15 The choice of a promoter is welt within the ability of a person skilled in the field of genetic 
egineering. For example, one may refer to the book of Sambrook et al. (1989) or also to the 
procedures described by Fuller et al. (1996). 

The vector containing the appropriate DNA sequence as described above, more preferably a TBC- 
20 1 gene regulatory polynucleotide, a polynucleotide encoding the TBC-1 polypeptide of SEQ ID No 
5 or both of them, can be utilized to transform an appropriate host to allow the expression of the 
desired polypeptide or polynucleotide. 



c) Other types of vectors 
25 The in vivo expression of a TBC-1 polypeptide of SEQ ID No 5 may be useful in order to correct 
a genetic defect related to the expression of the native gene in a host organism or to the 
production of a biologically inactive TBC-1 protein. 

Consequently, the present invention also deals with recombinant expression vedors mainly 
30 designed for the in vivo production of the TBC-1 polypeptide of SEQ ID No 5 by the introduction 
of the appropriate genetic material in the organism of the patient to be treated. This genetic 
material may be introduced in vitro in a cell that has been previously extraded from the 
organism, the modified cell being subsequently reintroduced in the said organism, diredly in 
vivo Into the appropriate tissue. 
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By <( vector » according to this specific embodiment of the invention is intended either a circular 
or a linear DNA molecule. 

5 One specific embodiment for a method for delivering a protein or peptide to the interior of a cell 
of a vertebrate in vivo comprises the step of introducing a preparation comprising a 
physiologically acceptable carrier and a naked polynucleotide operativeiy coding for the 
polypeptide of interest into the Interstitial space of a tissue comprising the cell, whereby the 
naked polynucleotide is taken up into the interior of the cell and has a physiological effect. 

10 

In a specific embodiment, the invention provides a composition for the in vivo production of the 
TBC-1 protein or polypeptide described herein. It comprises a naked polynucleotide operativeiy 
coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for 
introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide. 

15 

Compositions comprising a polynucleotide are described in PCT application WO 90/11092 
(Vical Inc.) and also in PCT application N° WO 95/11307 (Institut Pasteur. INSERM, Universite 
d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996): 

20 The amount of vector to be injected to the desired host organism varies according to the site of 
injection. As an indicative dose, it will be injected between 0.1 and 100 pg of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, it may be introduced in vitro in a 
25 host cell, preferably in a host cell previously harvested from the animal to be treated and more 
preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been 
transformed with the vector coding for the desired TBC-1 polypeptide or the desired fragment 
thereof is reintroduced into the animal body in order to deliver the recombinant protein within the 
body either locally or systemically. 

30 

In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus 
vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et 
al. (1994). Another preferred recombinant adenovirus according to this specific embodiment of 
the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of 
35 animal origin ( French patent application N' FR-93.05954). 
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Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 
recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in 
vivo , particularly to mammals, including humans. These vectors provide efficient delivery of 
genes into ceils, and the transferred nucleic acids are stably integrated into the chromosomal 
DNA of the host 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or in 
vitro gene delivery vehicles of the present invention include retroviruses selected from the group 
consisting of Mink-Cell Focus Inducing Vims, Murine Sarcoma Virus, Reticuloendotheliosis virus 
and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A 
and the 1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC 
No VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Vims (ATCC No VR- 
190; PCT Application No WO 94/24298). Particulariy prefen-ed Rous Sarcoma Viruses include 
Bryan high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728), Other preferred 
retroviral vectors are those described in Roth et al, (Roth JA et al., 1996), PCT Application No 
WO 93/25234. PCT Application No WO 94/ 06920. Roux et al.. 1989. Julan et al.,^ 1992 and 
Neda et aL, 1991. 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). The adeno-associated virus is a naturally occurring defective vims that 
requires another virus, such as an adenovirus or a herpes vims, as a helper virus for efficient 
replication and a productive life cycle (Muzyczka et aL. 1992). It is also one of the few viruses 
that may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable 
integration (Flotte et al,. 1992; Samulski et al.. 1989; McLaughlin et al., 1989). One 
advantageous feature of AAV derives from its reduced efficacy for transducing primary cells 
relative to transformed cells. 

Other compositions containing a vector of the invention advantageously comprise an 
oligonucleotide fragment of a nucleic sequence selected from the group consisting of SEQ ID 
Nos 2 or 3 as an antisense tool that inhibits the expression of the con-esponding TBC-I gene. 
Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et al. (1995) or those described in PCT Application No WO 
95/24223. 
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Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that are 
complementary to the 5*end of the TB01 mRNAs. In another embodiment, a combination of 
different antisense polynucleotides complementary to different parts of the desired targeted 
gene are used. 

5 

Preferred antisense polynucleotides according to the present invention are complementary to a 
sequence of the mRNAs of rsC-7 that contains the translation initiation codon ATG. 

Host cells 

Another object of the invention consists in host cell that have t^een transformed or transfected with 
10 one of the polynucleotides described therein, and more precisely a polynucleotide either 
comprising a TBC-I regulatory polynucleotide or the coding sequence of the TBC-1 polypeptide 
having the amino acid sequence of SEQ ID No 5. Are included host cells that are transformed 
(prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector such as one 
of those described above. 

15 

A recombinant host cell of the invention comprises any one of the polynucleotides or the 
recombinant vectors described therein. 

A preferred recombinant host cell according to the invention comrprises a polynucleotide 
20 selected from the following group of polynucleotides : 

a) a purified or isolated nucleic acid encoding a TBC-1 polypeptide, or a polypeptide fragment 
or variant thereof. 

b) a purified or isolated nucleic acid comprising at least 20 consecutive nucleotides of a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 

25 and 4. ? 

c) a purified or isolated nucleic acid comprising the nucleotide sequence SEQ ID No 2 or a 
biologically active fragment or variant of the nucleotide sequence of SEQ ID No 2. 

d) a purified or isolated nucleic acid comprising a 3'-regulatory sequence of the TBOI gene, or 
a biologically active fragment or variant thereof. 

30 e) a polynucleotide consisting of : 

(1) a nucleic acid comprising a regulatory polynucleotide of SEQ ID No 2 or a biologically active 
fragment or variant thereof; 

(2) a polynucleotide encoding a desired polypeptide or nucleic acid. 
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(3) Optionally, a nucleic acid comprising a 3'-regulatory sequence . preferably a 3 -regulatory 
sequence of the TBC~1 gene, or a biologically active fragment or variant thereof, wherein 
sequences (1), (2) and (3) are operably linked to one other. 

5 Another preferred recombinant cell host according to the present invention is characterized in 
that its genome or genetic background (including chromosome, plasmids) is modified by the 
nucleic acid coding for the TBC-1 polypeptide of SEQ ID No 5. 

Preferred host ceils used as recipients for the expression vectors of the invention are the 
10 following: 

a) Prokaryotic host cells ; Escherichia co// strains (I.E. DH5-a strain) or Bacillus subtilis, 

b) Eukaryotic host cells : HeLa cells (ATCC N''CCL2; NXCL2.1; N'*CCL2.2), Cv 1 cells (ATCC 
NXCL70), COS cells (ATCC N'CRL1650; NXRL1651), Sf-9 cells (ATCC NXRL1711). 

The constructs in the host cells can be used in a conventional manner to produce the gene product 
1 5 encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell density, the 
selected promoter is induced by appropriate means, such as temperature shift or chemical 
induction, and cells are cultivated for an additional period. 

20 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the 
resulting crude extract retained for further purification. ^ 

Microbial cells employed in the expression of proteins can be disrupted by any convenient method, 
25 including freeze-thaw cycling, sonication, mechanical disruption, or use df cell lysing agents. Such 
methods are well known by the skill artisan. 

Transgenic animals 

The terms "transgenic animals" or "host animals" are used herein to designate animals that have 
30 their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those 
belonging to a genus selected from Mas (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. 
rabbits) which have their genome artificially and genetically altered by the insertion of a nucleic 
acid according to the invention. 

35 
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The transgenic animals of the invention all include within a plurality of their cells a cloned 
recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
acids comprising a TB01 coding sequence, a TBC-I regulatory polynucleotide or a DNA 
sequence encoding an antisense polynucleotide such as described In the present specification. 

Preferred transgenic animals according to the invention contain in their somatic cells and/or in their 
gem line cells a polynucleotide selected from the following group of polynucieotides : 

a) a purified or isolated nucleic acid encoding a TBC-1 polypeptide, or a polypeptide fragment 
or variant thereof. 

b) a purified or isolated nucleic comprising at least 20 consecutive nucleotides of a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 
and 4. 

c) a purified or isolated nucleic acid comprising the nucleotide sequence SEQ ID No 2 or a 
biologically active fragment or variant of the nucleotide sequence of SEQ ID No 2. 

d) a purified or isolated nucleic acid comprising a 3'-regulatory sequence of the TB01 gene, or 
a biologically active fragment or variant thereof. 

e) a polynucleotide consisting of : 

(1) a nucleic acid comprising a regulatory polynucleotide of SEQ ID No 2 or a biologically active 
fragment or variant thereof, 

(2) a polynucleotide encoding a desired polypeptide or nucleic acid, 

(3) Optionally, a nucleic acid comprising a 3'-regu!atory sequence, preferably a 3'regulatory 
sequence of the TBC-1 gene, or a biologically active fragment or variant thereof, wherein 
sequences (1), (2) and (3) are operably linked to one other. 

The traqjsgenic animals of the Invention thus contain specific sequences of exogenous genetic 
material such as the nucleotide sequences described above in detail. 

In a first preferred embodiment, these transgenic animals may be good experimental models in 
order to study the diverse pathologies related to cell differentiation, in particular conceming the 
transgenic animals within the genome of which has been inserted one or several copies of a 
polynucleotide encoding a native TBC-1 protein, or alternatively a mutant TBC-1 protein. 

In a second preferred embodiment, these transgenic animals may express a desired polypeptide of 
interest under the control of the regulatory polynucleotides of the TBC-I gene, leading to good 
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yields in the synthesis of this protein of interest, and eventually a tissue specific expression of this 
protein of interest. 

Since it is possible to produce transgenic animals of the invention using a variety of different 
5 sequences, a general description will be given of the production of transgenic animals by refening 
generally to exogenous genetic material. This general description can be adapted by those skilled 
in the art in order to incorporate the DNA sequences into animals. For more details regarding the 
production of transgenic animals, and specifically transgenic mice, it may be referred to Sandou et 
aL (1994) and also to US Patents Nos 4,873,191, issued Oct.10, 1989. 5,968.766, issued Dec. 16. 
10 1997 and 5.387.742, issued Feb, 28. 1995, these documents being herein incorporated by 
reference to disclose methods for producing transgenic mice. 

Transgenic animals of the present invention are produced by the application of procedures which 
result in an animal with a genome that incorporates exogenous genetic material which is integrated 
15 into the genome. The procedure involves obtaining the genetic material, or a portion thereof, which 
encodes either a TBC-I coding sequence, a TBC-1 regulatory polynucleotide or a DNA sequence 
encoding an antisense polynucleotide such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell line. 

20 The insertion is made using electroporation. The cells subjected to electroporation are screened 
(e.g. Southem blot analysis) to find positive ceils which have integrated the exogenous' 
recombinant polynucleotide into their genome. An illustrative positive-negative selection procedure 
that may be used according to the invention is described by Mansour et al. (1988). Then, the 
positive cells are isolated, cloned and injected into 3.5 days old blastocysts from mice. The 

25 blastocysts a([e then inserted into a female host animal and allowed to grow to temn. The offsprings 
of the female host are tested to determine which animals are transgenic e.g. include the inserted 
exogenous DNA sequence and which are wild-type. 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 
30 recombinant expression vector or a recombinant host cell according to the invention. 

I. TBC-1 POLYPEPTIDE AND PEPTIDE FRAGMENTS 

It is now easy to produce proteins in high amounts by genetic engineering techniques through 
expression vectors such as plasmids, phages or phagemids. The polynucleotide that code for one 
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the polypeptides of the present invention is inserted in an appropriate expression vector in order to 
produce the polypeptide of interest in vitro. 

Thus, the present invention also concerns a method for producing one of the polypeptides 
5 described herein, and espedally a polypeptide of SEQ ID No 5 or a fragment or a variant thereof, 
wherein said method comprises the steps of : 

a) culturing, in an appropriate culture medium, a cell host previously transfomied or transfected 
with the recombinant vector comprising a nucleic acid encoding a TBC- polypeptide, or a fragment 
or a variant thereof; 

10 b) harvesting the culture medium thus conditioned or lyse the cell host, for example by sonication 
or by an osmotic shock; 

c) separating or purifying, from the said culture medium, or from the pellet of the resultant host cell 
lysate the thus produced polypeptide of interest. 

d) Optionally characterizing the produced polypeptide of interest. 

15 

In a specific embodiment of the above method, step a) is preceded by a step wherein the nucleic 
acid coding for a TBC-1 polypeptide, or a fragment or a variant thereof, is inserted in an 
appropriate vector, optionally after an appropriate cleavage of this amplified nucleic acid with one 
or several restriction endonucleases. The nucleic acid coding for a TBC-1 polypeptide or a 
20 fragment or a variant thereof may be the resulting product of an amplification reaction using a pair 
of primers according to the invention (by SDA, TAS, 3SR NASBA, TMA etc.). 

The polypeptides according to the invention may be characterized by binding onto an 
Immunoaffinity chromatography column on which polyclonal or monoclonal antibodies directed to a 
25 polypeptide of SEQ ID No 5, or a fragment or a variant thereof, have previously been immobilized. 

Purification of the recombinant proteins or peptides according to the present invention may be 
carried out by passage onto a Nickel or Cupper affinity chromatography column. The Nickel 
chromatography column may contain the Ni-NTA resin (Porath et al., 1975). 

30 

The polypeptides or peptides thus obtained may be purified, for example by high performance 
liquid chromatography, such as reverse phase and/or cationic exchange HPLC, as described by 
Rougeot et al. (1994). The reason to prefer this kind of peptide or protein purification is the lack 
of byproducts found in the elution samples which renders the resultant purified protein or 
35 peptide more suitable for a therapeutic use. 
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Another object of the present invention consists in a purified or isolated TBC-1 polypeptide or a 
fragment or a variant thereof. 



In a preferred embodiment, the TBC-1 polypeptide comprises an amino acid sequence of SEQ ID 
No 5 or a fragment or a variant thereof. 

The TBC-1 polypeptide of the invention possesses amino acid homologies as regards to the 
murine TBC-1 protein of 1141 amino acids in length which is described in US Patent No US 
5,700,927. The TBC-1 protein of the invention also possesses some homologies with two other 
proteins : the Pollux drosophila protein (Zhang et al., 1996) and the CDC16 protein from 
Caenorhabditis elegans (Wilson et al.. 1994). Figure 2 represents an amino acid alignment of a 
portion of the amino acid sequence of the TBC-1 protein of SEQ ID No 5 with other proteins 
shanng amino acid homology with TBC-1. The upper line shows the whole amino acid sequence 
of the murine tbc-1 protein described in US Patent No US 5 J00,927; the second line represents 
part of the amino acid sequence of the TBC-1 protein of SEQ ID No 5; the third line (Genbank 
access No : dmu50542) depicts the amino acid sequence of the Pollux protein mentioned above; 
the fourth line (Genbank access No : ceif35h12) shows the amino acid sequence of the-C. elegans 
protein mentioned above; the fifth line presents positions in which consensus amino acids are 
identified, i.e. amino acids shared by the sequences presented in the four upper lines, when 
present. 

The TBC-1 polypeptide of the amino acid sequence of SEQ ID No 5 has 1185 amino acids in 
length. The TBC-1 polypeptide includes a "TBC domain" which is spanning from the amino acid in 
position 786 to the amino acid in position 974 of the amino acid sequence of SEQ ID No 5. This 
TBC domain is represented in Figure 2 as a grey area spanning from the amino acid numbered 
758 to the amino acid numbered 949. This TBC domain is likely to regulate protein-protein 
interactions. Moreover, the TBC-1 TBC domain includes the amino acid sequence EVGYCQGU 
spanning from the amino acid in position 886 to the amino acid in position 893 of the amino acid 
sequence of SEQ ID No 5. The EVGYCQGL amino acid sequence spans from the amino acid 
numbered 861 to the amino acid numbered 868 of Figure 2. This site may interact with a kinase. 
Based on the stmctural similarity to cdc16, a yeast regulator of mitosis. TBC-1 is likely to regulate 
mitosis and cytokinesis by interacting with other proteins which also participate with the regulation 
of mitosis, cytokinesis and septum fomnation. 
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Preferred polypeptides of the invention comprise the T8C domain of TBC-1, or alternatively at least 
the EVGYCQGL amino acid sequence motif. 

A further object of the present invention concerns a puhfied or isolated polypeptide v\/hich is 
5 encoded by a nucleic acid comprising a nucleotide sequence selected from the group consisting of 
SEQ ID Nos 1, 3, and 4 or fragments or variants thereof. 

In a preferred embodiment, a variant TBC-1 polypeptide comprises amino acid changes ranging 
from 1, 2, 3 . 4. 5, 10 to 20 substitutions, additions or deletions of one aminoacid, preferably 
10 from 1 to 10, more preferably from 1 to 5 and most preferably from 1 to 3 substitutions, additions 
or deletions of one amino acid. The preferred amino acid changes are those which have little or 
no influence on the biological activity or the capacity of the variant TBC-1 polypeptide to be 
recognized by antibodies raised against a native TBC-1 protein. 

15 A single variant molecule of the TBC-1 protein is explicitly excluded from the scope of the 
present invention, which is a polypeptide having the same amino acid sequence than the murine 
tbc1 protein described in the US Patent No 5,700,927. 

In a second preferred embodiment, a mutated TBC-1 polypeptide comprises amino acid 
20 changes ranging from 1 to about 200 deletions of one amino acid and of at least one aminoacid 

substitution or addition, preferably from 1 to 10, 20 or 30 amino acid substitutions or additions. 

The amino acid substitutions are generally non conservative in terms of polarity, charge, 

hydrophliicity properties of the substitute amino acid when compared with the native amino acid. 

The amino acid changes occurring in such a mutated TBC-1 polypeptide may be detemninant 
25 for the biological activity or for the capacity of the mutated TBC-1 pojpeptide to be recognized 

by antibodies raised against a native TBC-1. 

Amino acid deletions, additions or substitutions in the TBC-1 protein are preferably located 
outside of the TEC domain as defined above. Most preferably, a mutated TBC-1 protein has an 
30 intact "EVGYCQGL" amino acid mot'rf. 

Such a mutated TBC-1 protein may be the target of diagnostic tools, such as specific 
monoclonal or polyclonal antibodies, useful for detecting the mutated TBC-1 protein in a 
sample. 
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In the case of an aminoacid substitution in the amino acid sequence of a polypeptide acording 
to the invention, one or several -consecutive or non-consecutive- aminoacids are replaced by 
« equivalent » aminoacids. The expression cc equivalent » aminoacid is used herein to designate 
5 any aminoacid that may be substituted for one of the aminoacids belonging to the native protein 
stmcture without decreasing the binding properties of the corresponding peptides to the 
antibodies raised against the TBC-1 protein of the amino acid sequence of SEQ ID No 5. In 
other words, the ((equivalent » aminoacids are those which allow the generation or the synthesis 
of a polypeptide with a modified sequence when compared to the aminoacid sequence of the 
10 native TBC-1 protein, said modified polypeptide being able to bind to the antibodies raised 
against the TBC-1 protein of the amino acid sequence of SEQ ID No 5 and/or to induce 
antibodies recognizing the parent polypeptide consisting in the TBC-1 polypeptide of the amino 
acid sequence of SEQ ID No 5. 

15 These equivalent aminoacids may be determined either by their structural homology with the 
initial aminoacids to be replaced, by the similarity of their net charge, and optionally by the 
results of the cross-immunogenicity between the parent peptides and their modified 
counterparts. 

20 The peptides containing one or several « equivalent » aminoacids must retain their specificity 
and affinity properties to the biological targets of the parent protein, as it can be assessed by a 
ligand binding assay or an ELISA assay. 

By an equivalent aminoacid according to the present invention is also meant the replacement of 
25 a residue in the L-form by a residue in the D form or the replacement o| a Glutamic acid (E) 
residue by a Pyro-glutamic acid compound. The synthesis of peptides containing at least one 
residue in the D-form is, for example, described by Koch (1977). 

A specific embodiment of a modified TBC-1 peptide molecule of interest according to the 
30 present invention, includes, but is not limited to, a peptide molecule which is resistant to 
proteolysis, is a peptide in which the -CONH- peptide bond is modified and replaced by a 
(CH2NH) reduced bond, a (NHCO) retro inverso bond, a (CH2-O) methylene-oxy bond, a (CHr 
S) thiomethylene bond, a (CH2CH2) carba bond, a (CO-CH2) cetomethylene bond, a {CHOH- 
CH2) hydroxyethylene bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. 

35 
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The invention also encompasses a TBC-1 polypeptide or a fragment or a variant thereof in 
which at least one peptide bound has been modified as described above. 

The polypeptides according to the invention may also be prepared by the conventional methods 
of chemical synthesis, either in a homogenous solution or in solid phase. As an illustrative 
embodiment of such chemical polypeptide synthesis techniques, it may be cited the 
homogenous solution technique described by Houbenweyl (1974). 

The TBC-1 polypeptide, or a fragment or a variant thereof may thus be prepared by chemical 
synthesis in liquid or solid phase by successive couplings of the different aminoacid residues to 
be incorporated (from the N-terminal end to the C-terminal end in liquid phase, or from the C- 
temiinal end to the N-terminal end in solid phase) wherein the N-terminal ends and the reactive 
side chains are previously blocked by conventional groups. 

For solid phase synthesis the technique described by Merrifield (1965) may be used in 
particular. 



J. ANTIBODIES 

The TBC-1 polypeptide of SEQ ID No 5 can be used for the preparation of polyclonal or 
monoclonal antibodies. 

The TBC-1 polypeptide expressed from a DNA sequence comprising at least one of the nucleic 
acid sequences of SEQ ID Nos 1, 3 and 4 may also be used to generate antibodies capable of 
specifically binding to the TBC-1 polypeptide of SEQ ID No 5. 

Preferred antibodies according to the invention are prepared using TBC-1 peptide fragments that 
do not comprise the EVGYCQGL amino acid motif 

Other preferred antibodies of the invention are prepared using TBC-1 peptide fragments that do not 
comprise the TBC domain defined elsewhere in the specification. 

The antibodies may be prepared from hybridomas according to the technique described by Kohler 
and Milstein in 1975. The polyclonal antibodies may be prepared by immunization of a mammal, 
especially a mouse or a rabbit, with a polypeptide according to the invention that is combined with 
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an adjuvant of immunity, and then by purifying of the specific antibodies contained in the seaim of 
the immunized animal on a affinity chromatography column on which has previously been 
immobilized the polypeptide that has been used as the antigen. 

5 The present invention also includes, chimeric single chain Fv antibody fragments (Martineau et al., 
1998), antibody fragments obtained through phage display libraries (Ridder et al., 1995; Vaughan 
et al., 1995) and humanized antibodies (Reinmann et al., 1997; Leger et al., 1997). 

Antibody preparations prepared according to either protocol are useful in quantitative 
10 immunoassays which determine concentrations of antigen-bearing substances in biological 
samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen 
in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
celts expressing the protein or reducing the levels of the protein in the body. 

1 5 Consequently, the invention is also directed to a method for detecting specifically the presence of a 
TBC-1 polypeptide according to the invention in a biological sample, said method comprising the 
following steps : 

a) bringing into contact the biological sample with an antibody according to the invention; 

b) detecting the antigen-antibody complex formed. 

20 

The invention also concerns a diagnostic kit for detecting in vitro the presence of a TBC-1 
polypeptide according to the present invention in a biological sample, wherein said kit comprises : 

a) a polyclonal or monoclonal antibody as described above, optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent 
25 carrying optionally ^'a label, or being able to be recognized itself by a labeled reagent, more 

particulariy in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled 
by itself 

K. SCREENING OF AGENTS ACTING AGAINST PROSTATE CANCER 

30 In a further embodiment, the present invention also concerns a method for the screening of new 
agents, or candidate substances, acting against prostate cancer and which may be suitable for 
the treatment of a patient whose DNA comprises an allele of the TB01 gene associated with 
prostate cancer, with an eariy onset of prostate cancer, or with the aggressiveness of prostate 
cancer tumors, or more generally with a modified or forthcoming expression of the TB01 gene, 
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with a modified or forthcoming production of the TBC-1 protein, or with the production of a 
modified 7SC-7 protein. 



In a preferred embodiment, the invention relates to a method for the screening of candidate 
substances for prostate cancer treatment. The method comprises the following steps: 

- providing a ceil line, an organ, or a mammal expressing a TBC-I gene or a fragment thereof, 
preferably the regulatory region or the promoter region of the TBC-I gene. 

- obtaining a candidate substance preferably a candidate substance capable of inhibiting the 
binding of a transcription factor to the TBC-1 regulatory region, 

- testing the ability of the candidate substance to decrease the symptoms of prostate cancer 
and/or to modulate the expression levels of TBC-1. 

In some embodiments, the cell line, organ or mammal expresses a heterologous protein, the 
coding sequence of which is operably linked to the TBC-1 regulatory or promoter sequence. In 
other embodiments, they express a TBC-I gene comprising alleles of one or more biallelic 
markers associated with prostate cancer an early onset of prostate cancer, or the 
aggressiveness of prostate cancer tumors, or a mutated TBC-1 gene comprising a trait causing 
mutation determined using the above-noted method. In a further embodiment,' a mouse 
expressing a TBC-I protein or a variant or a fragment thereof, more particularly by a nucleic 
acid comprising at least one of the biallelic markers according to the invention, can be used to 
screen agents acting against prostate cancer 

A candidate substance is a substance which can interact with or modulate, by binding or other 
intramolecular interactions, expression, stability, and function of TBC-1. Such substances may 
be potentially interesting for patients who are not responsive to existing dmgs or develop side 
effects to them. Screening may be effected using either in vitro methods or in vivo methods. 

Such methods can be carried out in numerous ways such as on transfomned cells which 
express the considered alleles of the TBC-I gene, on tumors induced by said transfomned cells, 
for example in mice, or on a TBC-1 protein encoded by the considered allelic variant of TBC-1 , 

Screening assays of the present invention generally involve detennining the ability of a 
candidate substance to present a cytotoxic effect, to change the characteristics of transformed 
cells such as proliferative and invasive capacity, to affect the tumor growth, or to modify the 
expression level of TBC-1. 
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Typically, this method includes preparing transformed cells with different forms of TflC-t 
sequences containing particular, alleles of one or more biallelic markers and/or trait causing 
mutations described above. This is followed by testing the cells expressing the TBC-1 with a 
5 candidate substance to determine the ability of the substance to present cytotoxic effect, to 
affect the characteristics of transformed cells, the tumor growth, or to modify the expression 
level of TBC-1. 

Typical examples of such drug screening assays are provided below. It is to be understood that 
10 the parameters set forth in these examples can be modified by the skilled person without undue 
experimentation. 

Methods for screening substances interacting with a TBC-1 polypeptide 

15 For the purpose of the present invention, a ligand means a molecule, such as a protein, a 
peptide, an antibody or any synthetic chemical compound capable of binding to the TBC-1 
protein or one of its fragments or variants or to modulate the expression of the polynucleotide 
coding for TBC-1 or a fragment or variant thereof. 

20 In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of the TBC-1 protein is brought into contact 
with a purified TBC-1 protein, for example a purified recombinant TBC-1 protein produced by a 
recombinant cell host as described hereinbefore, in order to fomn a complex between the TBC-1 
protein and the putative ligand molecule to be tested: 

25 

A. Candidate Hgands obtained form random peptide libraries 

In a particular embodiment of the screening method, the putative ligand is the expression 
product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, 
random peptide phages libraries are used. The random DNA inserts encode peptides of 8 to 20 

30 aminoacids in length (Oldenburg K.R, et al„ 1992,.; Vaiadon P.. et al., 1996; Lucas A.H.. 1994; 
Westerink M.A.J., 1995; Castagnoli L. et al.. 1991). According to this particular embodiment, the 
recombinant phages expressing a protein that binds to the immobilized TBC-1 protein are 
retained and the complex formed between the TBC-1 protein and the recombinant phage may 
be subsequently immunoprecipitated by a polyclonal or a monoclonal antibody directed against 

35 the TBC-1 protein. 
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Once the ligand library in recombinant phages has been constructed, the phage population is 
brought into contact with the immobilized TBC-1 protein. Then the preparation of complexes is 
washed in order to remove the non-specifically bound recombinant phages. The phages that 
5 bind specifically to the TBC-1 protein are then eluted by a buffer {acid pH) or 
immunoprecipitated by the anti-TBC-1 monoclonal antibody produced by a hybridoma, and this 
phage population is subsequently amplified by an over-infection of bacteria (for example E, coll). 
The selection step may be repeated several times, preferably 2-4 times, in order to select the 
more specific recombinant phage clones. The last step consists in characterizing the peptide 
10 produced by the selected recombinant phage clones either by expression in infected bacteria 
and isolation, expressing the phage insert in another host-vector system, or sequencing the 
insert contained in the selected recombinant phages. 

S. Candidate ligands obtained througii a two-hybrid screening assay. 
15 The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields and 
Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain of the yeast 
Gal4 protein. This technique is also described in US Patent N** US 5,667,973 and US Patent N° 
5,283,173 (Fields et al.) the technical teachings of both patents being herein incorporated by 
reference. 

20 

The general procedure of library screening by the two-hybrid assay may be performed as 
described by Harper et al. (Harper JW et a!.. 1993) or as described by Cho et al. (1998) or also 
Fromont-Racine et al. (1997). 

25 The bait protein or polypeptide consists of a TBC-1 polypeptide or a fragment or variant thereof. 

More precisely, the nucleotide sequence encoding the TBC-1 polypeptide or a fragment or 
variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 
protein, the fused nucleotide sequence being inserted in a suitable expression vector, for 
30 example pAS2 or pM3. 

Then, a human cDNA library is constructed in a specially designed vector, such that the human 
cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional 
domain of the GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides 
35 encoded by the nucleotide inserts of the human cDNA library are termed "pray" polypeptides. 
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A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT 
gene that is placed under the control of a regulation sequence that is responsive to the binding 
of a complete Gal4 protein containing both the transcriptional activation domain and the DNA 
binding domain. For example, the vector pG5EC may be used. 

Two different yeast strains are also used. As an illustrative but non limiting example the two 
different yeast strains may be the following : 

- Y190. the phenotype of which is {MATa, Leu2-3, 112 ura3-12, trp1-901, his3'D200, ade2- 
101. gal4DgaliaOD URA3 GAL-LacZ, LYS GAL'HIS3, cyff)\ 

- Y1 87, the phenotype of which is {MATa gal4 galdO his3 trpl-901 ade2'101 ura3-52 Ieu2'3, - 
112 URA3 GAL'lacZmef), which is the opposite mating type of Y190. 

Briefly, 20 pg of pAS2ATBC-1 and 20 pg of pACT-cDNA library are co-transfonmed into yeast 
strain Y190. The transformants are selected for growth on minimal media lacking histidine, 
leucine and tn/ptophan. but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive 
colonies are screened for beta galactosidase by filter lift assay. The double positive colonies 
(His\ beta-gat) are then grown on plates lacking histidine. leucine, but containing tryptophan 
and cycloheximide (10 mg/mi) to select for loss of pAS2yTBC-1 ptasmids but retention of pACT- 
cDNA library plasmids. The resulting Y190 strains are mated with Y187 strains expressing TBC- 
1 or non-related control proteins; such as cyclophilin B, lamin, or SNF1, as Gal4 fusions as 
described by Harper et al. (1993) and by Bram et a!. (1993). and screened for beta 
galactosidase by filter lift assay. Yeast clones that are beta gal- after mating with the control 
Ga/4 fusions are considered false positives, 

Si 

In another embodiment of the two-hybrid method according to the invention, the interaction 
between TBC-1 or a fragment or variant thereof with cellular proteins may be assessed using 
the Matchmaker Two Hybrid System 2 (Catalog No. K1 604-1 , Clontech). ). As described in the 
manual accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1 604-1, Clontech), 
the disclosure of which is incorporated herein by reference, nucleic acids encoding the TBC-1 
protein or a portion thereof, are inserted into an expression vector such that they are in frame with 
DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. A desired 
cDNA. preferably human cDNA. is inserted into a second expression vector such that they are in 
frame with DNA encoding the activation domain of GAL4. The two expression plasmids are 
transformed into the yeast cells and the yeast cells are plated on selection medium which selects 
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for expression of selectable markers on each of the expression vectors as well as GAL4 dependent 
expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are 
screened for GAL4 dependent lacZ expression. Those cells which are positive in both the histidine 
selection and the lacZ assay are those in which an interaction between TBC-1 and the protein or 
5 peptide encoded by the initially selected cDNA insert has taken place. 

Method for screening ligands that modulate the expression of the TBC-I gene. 

Another subject of the present Invention is a method for screening molecules that modulate the 
expression of the TBC-1 protein. Such a screening method comprises the steps of : 
10 a) cultivating a prokaryotic or an eukaryotic cell that has been transfected with a nucleotide 
sequence encoding the TBC-1 protein, operably linked to a TBC-I 5'-regu(atory sequence; 

b) bringing into contact the cuKivated cell with a molecule to be tested; 

c) quantifying the expression of the TBC-1 protein. 

15 Using DNA recombination techniques well known by the one skill in the art, the TBC-1 protein 
encoding DNA sequence is inserted into an expression vector, downstream from a TBC-I 5- 
regulatory sequence that contains a TBC-I promoter sequence. As an illustrative example, the 
promoter sequence of the TBC-1 gene is contained in the nucleic acid of SEQ ID No 2. 

20 The quantification of the expression of the TBC-1 protein may be realized either at the mRNA 
level or at the protein level. In the latter case, polyclonal or monoclonal antibodies may be used 
to quantify the amounts of the TBC-1 protein that have been produced, for example in an ELISA 
or a RIA assay. 

25 In a preferred embodiment, the quantification of the TBC-1 mRNAs is realized by a quantitative 
PCR amplification bf the cDNAs obtained by a reverse transcription of the total mRNA of the 
cultivated 7SC- "/-transfected host cell, using a pair of primers specific for TBC-1. 

Expression levels and pattems of TBC-1 may be analyzed by solution hybridization with long 
30 probes as described in International Patent Application No. WO 97/05277, the entire contents of 
which are incorporated herein by reference. Briefly, the TBC-1 cDNA or the TBC-1 genomic DNA 
described above, or fragments thereof, is inserted at a cloning site immediately downstream of a 
bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce antisense RNA. Preferably, 
the rSC-7 insert comprises at least 100 or more consecutive nucleotides of the genomic DNA 
35 sequence or the cDNA sequences, particularly those comprising one of the nuceotide sequences 
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of SEQ ID Nos 3, 4 and 6-8 or those encoding a mutated TBC-1. The plasmid is linearized and 
transcribed in the presence of ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP 
and DiG-UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated 
from cells or tissues of interest. The hybridizations are performed under standard stringent 
conditions (40-50°C for 16 hours in an 80% fomnamide. 0.4 M NaCI buffer, pH 7-8). The 
unhybridized probe is removed by digestion with ribonucleases specific for single-stranded RNA 
(i.e. RNases CL3, T1. Phy M, U2 or A). The presence of the biotin-UTP modification enables 
capture of the hybrid on a microtitration plate coated with streptavidin. The presence of the DIG 
modification enables the hybrid to be detected and quantified by ELISA using an anti-DIG antibody 
coupled to alkaline phosphatase. 

Quantitative analysis of TBC-1 gene expression may also be performed using arrays. As used 
herein, the term array means a one dimensional, two dimensional, or multidimensional 
arrangement of a plurality of nucleic acids of sufficient length to permit specific detection of 
expression of mRNAs capable of hybridizing thereto. For example, the arrays may contain a 
plurality of nucleic acids derived from genes whose expression levels are to be assessed. The 
arrays may include the TBC-1 genomic DNA. the TBC-1 cDNA sequences or the sequences 
complementary thereto or fragments thereof, particulariy those comprising at least one of the 
biallelic marl<ers according the present invention, preferably at least one of SEQ ID 7-8 or those 
comprising a trait causing mutation. Preferably, the fragments are at least 15 nucleotides in 
length. In other embodiments, the fragments are at least 25 nucleotides in length. In some 
embodiments, the fragments are at least 50 nucleotides in length. More preferably, the 
fragments are at least 100 nucleotides in length. In another preferred embodiment, the 
fragments are more than 100 nucleotides in length. In some embodiments the fragments may 
be more than 500 nucleotides in length. 

For example, quantitative analysis of TBC-1 gene expression may be performed with a 
complementary DNA microarray as described by Schena et al. (1995). Full length TBC-1 cDNAs 
or fragments thereof are amplified by PGR and arrayed from a 96-well microtiter plate onto 
silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid 
chamber to allow rehydration of the array elements and rinsed, once in 0.2% SDS for 1 min, 
twice in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are 
submerged in water for 2 min at 95-C, transferred into 0.2% SDS for 1 min. rinsed twice with 
water, air dried and stored in the dart< at 25''C. 
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Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a single 
round of reveree transcription. Probes are hybridized to 1 cm^ microarrays under a 14 x 14 mm 
glass coverslip for 6-12 hours at SO'C. Arrays are washed for 5 min at 25°C in low stringency 
wash buffer (1 x SSC/0.2% SDS), then for 10 min at room temperature in high stringency wash 
buffer (0.1 x SSC/0.2% SDS). Arrays are scanned in 0.1 x SSC using a fluorescence laser 
scanning device fitted with a custom filter set. Accurate differential expression measurements 
are obtained by taking the average of the ratios of two independent hybridizations. 

Quantitative analysis of TBC-1 gene expression may also be performed with full length TBC-1 
cDNAs or fragments thereof in complementary DNA arrays as described by Pietu et al. (1996). 
The full length TBC-1 cDNA or fragments thereof is PCR amplified and spotted on membranes. 
Then, mRNAs originating from various tissues or cells are labeled with radioactive nucleotides. 
After hybridization and washing in controlled conditions, the hybridized mRNAs are detected by 
phospho-imaging or autoradiography. Duplicate experiments are performed and a quantitative 
analysis of differentially expressed mRNAs is then perfomned. 

Alternatively, expression analysis using the TBC-1 genomic DNA, the TBC-1 cDNAs, or 
fragments thereof can be done through high density nucleotide arrays or chips as described by 
Lockhart et al. (1996) and Sosnowsky et al. (1997). Oligonucleotides of 15-50 nucleotides from 
the sequences of the TBC-1 genomic DNA. the TBC-1 cDNA sequences particularly those 
comprising at least one of bialleiic markers according the present invention, preferably at least 
one of SEQ ID No 7-8 or those comprising the trait causing mutation, or the sequences 
complementary thereto, are synthesized directly on the chip (Lockhart et al.. supra) or 
synthesized and then addressed to the chip (Sosnowski et al.. supra). Preferably, the 
oligonucleotides are about 20 nucleotides in length. 

TBC-1 cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or 
fluorescent dye. are synthesized from the appropriate mRNA population and then randomly 
fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to 
the chip. After washing as described in Lockhart et al., supra and application of different electric 
fields (Sosnowsky et ai.. 1997)., the dyes or labeling compounds are detected and quantified. 
Duplicate hybridizations are performed. Comparative analysis of the intensity of the signal 
originating from cDNA probes on the same target oligonucleotide in different cDNA samples 
indicates a differential expression of TBC-1 mRNAs. 
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Screening using transgenic animals 

In vivo methods can utilize transgenic animals for drug screening. Nucleic acids including at 
least one of the biallelic polymorphisms of interest can be used to generate genetically modified 
non-human animals or to generate site specific gene modifications in cell lines. The term 
"transgenic" is intended to encompass genetically modified animals having a deletion or other 
knock-out of TBC-I gene activity, having an exogenous TB01 gene that is stably transmitted in 
the host cells, or having an exogenous TBC-1 promoter operabty linked to a reporter gene. 
Transgenic animals may be made through homologous recombination, where the TBC-1 locus 
is altered. Alternatively, a nucleic acid construct is randomly integrated into the genome. Vectors 
for stable integration include for example plasmids. retroviruses and other animal viruses, and 
YACs. Of interest are transgenic mammals e.g. cows, pigs, goats, horses, and particularly 
rodents such as rats and mice. Transgenic animals allow to study both efficacy and toxicity of 
the candidate drug. 



Methods for inhibiting the expression of a TBC-I gene 

Other therapeutic compositions according to the present invention comprise advantageously an 
oligonucleotide fragment of the nudeic sequence of TBC-1 as an antisense tool that inhibits the 
expression of the corresponding TBC-1 gene. Preferred methods using antisense 
polynucleotide according to the present invention are the procedures described by Sczakiel et 
al. (1995). 

(Preferably, the antisense tools are choosen among the polynucleotides (15-200 bp long) that 
are complementary to the 5'end of the TBC-I mRNA. In another embodiment, a combination of 
different antisense polynucleotides complementary to different parts of the desired targetted 
gene are used. 

Prefen-ed antisense polynucleotides according to the present Invention are complementary to a 
sequence of the mRNAs of 7BC-) that contains the translation initiation codon ATG 

The antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA 
sequences. They comprise a nucleotide sequence complementary to the targeted sequence of 
the PTCA-1 genomic DNA, the sequence of which can be determined using one of the detection 
methods of the present invention. The targeted DNA or RNA sequence preferably comprises at 
least one of the biallelic markers according to the present invention, particularly a nucleotide 
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sequence selected from the group consisting of SEQ ID 7-14, or comprises a trait causing 
mutation, in a preferred embodiment, the antisense oligonucleotide are able to hybridize with at 
least one of the splicing sites of the targeted TBC-1 gene, with the 3'UTR of the S'UTR, with 
exon 6bis, or with an exonic region comprising at least one of the bialielic markers of the 

5 present invention or comprising a trait causing mutation. The antisense nucleic acids should 
have a length and melting temperature sufficient to permit formation of an intracellular duplex 
having sufficient stability to inhibit the expression of the TBC-1 mRNA in the duplex. Strategies 
for designing antisense nucleic acids suitable for use in gene therapy are disclosed in Green et 
al., (1986) and Izant and Weintraub. (1984), the disclosures of which are incorporated herein by 

10 reference. 

In some strategies, antisense molecules are obtained by reversing the orientation of the TBC-1 
coding region with respect to a promoter so as to transcribe the opposite strand from that which 
is nomially transcribed in the cell. The antisense molecules may be transcribed using in vitro 
15 transcription systems such as those which employ T7 or SP6 polymerase to generate the 
transcript. Another approach involves transcription of TBC-1 antisense nucleic acids in vivo by 
operably linking DNA containing the antisense sequence to a promoter in a suitable expression 
vector. 

20 Alternatively, suitable antisense strategies are those described by Rossi et al. (1991), in the 
International Applications Nos. WO 94/23026. WO 95/04141, WO 92/18522 and in the 
European Patent Application No. EP 0 572 287 A2 

An alternative to the antisense technology that is used according to the present invention 
25 consists in using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzin| its target site 
(namely « hammerhead ribozymes »). Briefly, the simplified cycle of a hammerhead riboryme 
consists of (1) sequence specific binding to the target RNA via complementary antisense 
sequences; (2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release 
30 of cleavage products, which gives rise to another catalytic cycle. Indeed, the use of long-chain 
antisense polynucleotide (at least 30 bases 'org) or ribozymes ^A/ith long antisense amis are 
advantageous. A preferred delivery system for antisense ribozyme is achieved by covalently 
linking these antisense ribozymes to lipophilic groups or to use liposomes as a convenient 
vector. Preferred antisense ribozymes according to the present invention are prepared as 
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described by Sczakiel et al. (1995), the specific preparation procedures being refen-ed to in said 
article being herein incorporated by reference. 



EXAMPLES 

5 

EXAIVIPLE 1 : Analysis of the first mRNA encoding a TBC-1 polypeptide synthesized by 
the cells. 

TBC-1 cDNA was obtained as follows : 4|jl of ethanol suspension containing 1 mg of human 
10 prostate total RNA (Clontech laboratories. Inc.. Palo Alto. USA; Catalogue N. 64038-1) was 
S centrifuged, and the resulting pellet was air dried for 30 minutes at room temperature. 

=f First strand cDNA synthesis was performed using the AdvantageTM RT-for- PGR kit (Clontech 

laboratories Inc., catalogue N. K1402-1). 1 pi of 20 mM solution of a specific oligo dT primer 

-| 1 5 was added to 1 2.5 [i\ of RNA solution in water, heated at 74°C for 2.5 min and rapidly quenched 
in an ice bath. 10 mI of 5 x RT buffer (50 mM Tris-HCI, pH 8.3, 75 mM KCI, 3 mM MgCy. 2.5 pl 
of dNTP mix (10 mM each). 1.25 pi of human recombinant placental RNA inhibitor were mixed 
with 1 ml of MMLV reverse transcriptase (200 units). 6.5 pi of this solution were added to RNA- 
primer mix and incubated at 42"'C for one hour. 80 pi of water were added and the solution was 

iS 20 incubated at 94'C for 5 minutes. 

.g of the resulting solution were used in a Long Range PCR reaction with hot start, in 50 pi final 

volume, using 2 units of rtTHXL. 20 pmol/pl of each of 5'-TGACCACCATGCCCATGCT-3' (SEQ 
ID No 13) and 5'-GCATTTATTCACGTCCACGCC-3' (SEQ ID No 14) primers with 35 cycles of 
25 elongation for 6 minutes at 67°C in thermocycler. 

The amplification products corresponding to both cDNA strands were partially sequenced in 
order to ensure the specificity of the amplification reaction. 

30 Results of Nothern blot analysis of prostate mRNAs supported the existence of the first TBC-1 
cDNA having about 4 Kb in length, which is the nucleotide sequence of SEQ ID No 3. 
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Example 2 : Detection of TBC-1 biallelic markers: DNA extraction 

Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a 
French heterogeneous population. The DNA from 100 individuals was extracted and tested for the 

detection of the biallelic mariners. 

30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. Cells 
(pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by a 
lysis solution (50 ml final volume : 10 mM Tris pH7.6; 5 mM MgCij; 10 mM NaCI). The solution 
was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual 
red cells present in the supernatant, after resuspension of the pellet in the lysis solution. 
The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution composed of: 

- 3 ml TE 10-2 (Tris-HC1 10 mM, EDTA 2 mM) / NaCI 0.4 M 
-200 pi SDS 10% 

- 500 pi K-proteinase (2 mg K-proteinase in TE 10-2 / NaCI 0.4 M). 

For the extraction of proteins. 1 ml saturated NaCI (6M) (1/3.5 v/v) was added. After vigorous 
agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution 
was rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 
2000 rpm. The pellet was dried at 37°C. and resuspended in 1 ml TE 10-1 or 1 ml water. The 
DNA concentration was evaluated by measuring the OD at 260 nm (1 unit OD = 50 pg/ml DNA). 
To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1.8 and 2 were 

used in the subsequent examples described below. 

1 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 

Example 3 : Detection of the biallelic markers: amplification of genomic DNA by PGR 

The amplification of specific genomic sequences of the DNA samples of example 2 was earned 
out on the pool of DNA obtained previously. In addition. 50 individual samples were similarly 
amplified. 

PGR assays were performed using the following protocol: 

Final volume 25 pi 
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DNA 2ng/Ml 

MgClz 2 mM 

dNTP{each) 200 

primer (each) 2.9 ng/pl 

Ampli Taq Gold DNA polymerase 0.05 unit/pi 

PGR buffer (1 Ox = 0. 1 M TrisHCI pH8.3 0.5M KCI 1 x 

Each pair of first primers was designed using the sequence information of the TBC-1 gene 
disclosed herein and the OSP software (Hiliier & Green. 1991). This first pair of primers was 
about 20 nucleotides in length and had the sequences disclosed in Table 1 in the columns 
labeled PU and RP. 
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Table 1 





Amplified region of TBC-1 gene 


PU 


RP 


1 


intron 1 


SEQ ID No 9 


SEQ ID No iO 



Preferably, the primers contained a common oligonucleotide tail upstream of the specific bases 
5 targeted for amplification which was useful for sequencing. 

Primers PU contain the following additional PU 5' sequence : TGTAAAACGACGGCCAGT; 
primers RP contain the following RP 5' sequence : CAGGAAACAGCTATGACC. The primer 
containing the additional PU 5' sequence is listed in SEQ ID No 9. The primer containing the 
additional RP 5* sequence is listed in SEQ ID No 10. 

0 

The synthesis of these primers was perfomned following the phosphoramidite method, on a 
GENSET UFPS 24.1 synthesizer. 



DNA amplification was performed on a Genius 11 themnocycler. After heating at 95°C for 10 min, 
1 5 40 cycles were performed. Each cycle comprised: 30 sec at 95"C. 54°C for 1 min, and 30 sec at 
72X. For final elongation, 10 min at 72"C ended the amplification. The quantities of the 

amplification products obtained were determined on 96-wel! microtiler plates, using a 
tluorometer and Picogreen as intercalant agent (Molecular Probes). 

20 Example 4 : Detection of the biallellc markers: sequencing of amplified genomic DNA and 

identification of polymorphisms. 

The sequencing of the amplified DNA obtained in example 3 was carried out on ABI 377 
sequencers. The sequences of the amplification products were determined using automated 
dideoxy terminator sequencing reactions with a dye tenninator cycle sequencing protocol. The 
25 products of the sequencing reactions were mn on sequencing gels and the sequences were 
determined using gel image analysis [ABI Prism DNA Sequencing Analysis software (2.1,2 
version) and the above mentioned proprietary "Trace" basecaller]. 

The sequence data were further evaluated using the above mentioned polymorphism analysis 
30 software designed to detect the presence of biallelic markers among the pooled amplified 
fragments. The polymorphism search was based on the presence of superimposed peaks in the 
electrophoresis pattern resulting from different bases occumng at the same position as 
described previously. 
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One fragment of amplification was analyzed. In this segment, 1 biallelic marker was detected. 
The localization of the biallelic marker is as shown in Table 2. 



5 Table 2 



Amplicon 


Marker Name 


Localization in TBC-1 gene 


Polymorphism 


Major allele 


Minor allele 


1 


99-430-352 


Intro n 1 


C/T 


SEQ ID No 7 


SEQ ID No 8 



Example 5 : Validation of the polymorphisms through microsequencing 

10 The biallelic markers identified in example 4 were further confirmed and their respective 
frequencies were determined through microsequencing. Microsequencing was carried out for 
each individual DNA sample described in Example 2. 

Amplification from genomic DNA of individuals was performed by PGR as described above for 

15 the detection of the biallelic markers with the same set of PGR primers (Table 1), 

The preferred primers used in microsequencing were about 23 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
primers used in microsequencing are detailed in Table 3. 

20 



Table 3 



Marker Name 


PU Microsequencing primer 


RP microsequencing primer 


99-430-352 


SEQ ID No 11 


SEQ ID No 12 



The microsequencing reaction was performed as follows : 

After purification of the amplification products, the microsequencing reaction mixture was 

25 prepared by adding, in a 20^il final volume: 10 pmol microsequencing oligonucleotide, 1 U 
Thennosequenase (Amersham E79000G), 1 .25 ^il Thermosequenase buffer (260 mM Tris HGI 
pH 9.5, 65 mM MgGl2), and the two appropriate fluorescent ddNTPs (Perkin Elmer. Dye 
Terminator Set 401095) complementary to the nucleotides at the polymorphic site of each 
biallelic marker tested, following the manufacturer's recommendations. After 4 minutes at 

30 94X, 20 PGR cycles of 15 sec at 55'G, 5 sec at 72'G, and 10 sec at 94'C were carried out in a 
Tetrad PTG-225 thermocycler (MJ Research). The unincorporated dye temninators were then 
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removed by ethanol precipitation. Samples were finally resuspended in formamide-EDTA 
loading buffer and heated for 2 min at 95X before being loaded on a polyacrylamide 
sequencing gel. The data were collected by an ABI PRISM 377 DNA sequencer and processed 
using the GENESCAN software (Perkin Elmer). 

5 Following gel analysis, data were automatically processed with software that allows the 

determination of the alleles of biallelic markers present in each amplified fragment. 

The software evaluates such factors as whether the intensities of the signals resulting 
from the above microsequencing procedures are weak, normal, or saturated, or whether the 
signals are ambiguous. In addition, the software identifies significant peaks (according to shape 

10 and height criteria). Among the significant peaks, peaks corresponding to the targeted site are 
identified based on their position. When two significant peaks are detected for the same 
position, each sample is categorized classification as homozygous or heterozygous type based 
on the height ratio, 

15 Example 6 : Association study between prostate cancer and the biallelic marker of the 
TBC'1 gene: collection of DNA samples from affected and non-affected individuals 

Affected population ; 

The positive trait in this association study was prostate cancer. Prostate cancer patients were 
recruited according to a combination of clinical, histological and biological inclusion criteria. 

20 Clinical criteria can include rectal examination and prostate biopsies. Biological criteria can 
Include PSA assays. The affected individuals were recorded as familial fomns when at least two 
persons affected by prostate cancer have been diagnosed in the fgmily. Remaining cases were 
classified as non-familial informative cases (at least two sibs of the case both aged over 50 
years old are unaffected), or non-familial uniformative cases (no information about sibs over 50 

25 years old is available). Cases were also separated following the criteria of diagnosis age : eariy 
onset prostate cancer (under 65 years old) and late onset prostate cancer (65 years old or 
more). 

The different populations included in the association study of this patent are characterized in 
30 more detail in Table 4. 



Table 4 
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M2S2 


- affected : unrelated individuals 




- unaneaea : neauny ina. ot do yecKs utu 
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M3oZ 


• affected : individuals diagnosed before 65 years old 


- unaffected : healthy ind. of + 65 years old 


M4S2 


- affected : individuals diagnosed after 65 years old 


- unaffected : healthy ind. of + 65 years old 


M5S2 


- affected : sporadic cases 






- unaffected : healthy ind. of + 65 years old 


M6S2 


- affected : familial cases 






- unaffected : healthy ind. of + 65 years oid 



Several populations were studied among the affected group, which were the followings : 
1) Unaffected population : 

Control individuals included in this study were checked for both the absence of ail clinical and 
biological criteria defining the presence or the risk of prostate cancer (PSA < 4 ng/ml) (WO 
96/21042), and for their age (aged 65 years old or more). All unaffected individuals included in 
the statistical analysis of this sample were unrelated. The unaffected population has a size of 
127 individuals. 



1 0 2) Affected populations : 

a) The class M2S2 consisted of 342 unrelated individuals affected by prostate cancer 

b) The class M3S2 consisted of 136 unrelated individuals affected by prostate cancer and which 
have been diagnosed after the age of 65 years old. The patients of this class were included in 
the class iV12S2; 

c) The ':!a'^s M^S2 consisted 'n 206 ur^related individuals affected by prostate cancer and which 
had been diagnosed before the age of 65 years old. The patients of this class were included in 
the class M2S2. 

d) The class M5S2 consisted in 176 prostate cancer sporadic cases, 

e) The class M6S2 consisted in 166 unrelated individuals belonging to families wherein at least 
20 two members had been affected by prostate cancer. 

EXAMPLE 7 : Association study between prostate cancer and the biallelic marker of the 
TBC'1 gene: Genotyping of affected and control individuals. 

The general strategy to perfomn the association studies was to individually scan the DNA 
25 samples from all individuals in each of the populations described above in order to establish the 
ailele frequencies of biallelic marker 99-430-352 in each of these populations. 

Allelic frequencies of the above-described biallelic marker in each population were determined 
by perfonning microsequencing reactions on amplified fragments obtained by genomic PGR 
30 performed on the DNA samples from each individual. Genomic PGR and microsequencing were 
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performed as detailed above in examples 3 and 5 using the described PGR and 
microsequencing primers. The results are depicted in Table 5. 

5 Table 5 




The column labeled Top. Size" indicates the number of individuals for which the DNA was 
genotyped using the microsequencing primer having the nucleotide sequence of SEQJD No 11. 

10 

The columns labeled C/C, T/T and CfT correspond to the respective frequencies of the different 
allele polymorphisms for bialleiic marker 99-430-352 in the diploid genome of the tested 
individuals. 

15 The column POP depicts the different populations under consideration that have been 
previously described in Example 5. 

!i 

EXAMPLE 8 : Association study between prostate cancer and the bialleiic nnarker of the 
20 TBC-1 gene. 

Table 6 shows, for each of the populations studied, the frequency of each allele and the 
statistical significance of an imbalance in the frequency of the major allele of the bialleiic marker 
99-430-352 between the unaffected population and each of the affected populations described 
25 in Example 5. 
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Table 6 




The column labeled Top. Size" indicates the number of individuals for which the'DNA was 
5 genotyped using the microsequencing primer having the nucleotide sequence of SEQ ID No 1 1. 

The column labeled "C* or "T" show the relative allele frequency of biallelic marker 99-430-352. 
the percentage value being expressed for a haploid genome and calculated as follows : 

a) Frequency (F) of the 99-430-352 biallelic marker allele bearing C as the polymorphic base for 
10 the M2S2 population : 

Ft:.M2S2= [(C/Cm2S2 X Pop. SizeM2S2 X 2) + (C/Tm2S2 X Pop. SizeM2S2)] / [Pop. SizeM2S2 X 21 

b) Frequency (F) of the 99-430-352 biallelic marker allele bearing T as the polymorphic base for 
the M2S2 population : 

Ft.m2S2 = 1(T/Tm2S2 X Pop. SizeM2S2 X 2) + (C/T„2S2 X Pop. SizeM2S2)] / [Pop. SizeM2S2 X 2]. 

i 

The column "diff {chi2 - pvalue)" show the following values : 

a) "diff* is the result of the difference in the allele frequency of the more frequent allele, which is 
allele "C", between the affected and unaffected individuals, as detailed in Table 6. 

b) "chi2" is the result of the statistical Chi2 test when taking into account the "diff value and the 
20 size of both the affected and unaffected populations under consideration. 

c) "p value" is calculated from the "chi2" value, with a degree of freedom value of 1. 

The higher the difference of frequency is, the higher the chi2 value is and the lower the p value 
is, the stronger the association between the marker and the trait is, 

25 
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It has been considered by the inventors that a pvalue of 5% (5 x 10*^) is statistically significant. 



From the data shown in Table 6, it can be observed that there is a high chi2 value ( 10.95) and 
a very low p value (9.11 x 10^) when testing the M5S2 population, which con-esponds to the 
sporadic cases. It can be observed that the pvalue is far lower than 5 x 10"^ and thus that the 
difference in allele frequency between the healthy population and the prostate cancer sporadic 
cases is highly significant. 

It can then be stated that a very significant association exists between the 99-430-352 biallelic 
marker and prostate cancer, and more particularly with sporadic prostate cancer cases. 

While the preferred embodiments of the invention have been Illustrated end described, it will be 
appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 
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What is claimed is : 



1 . A purified or isolated nucleic acid encoding a TBC-1 protein having the amino acid sequence of 
SEQ ID No 5, or a peptide fragment or variant thereof. 

5 

2. A purified or isolated nucleic acid comprising a nucleotide sequence of SEQ ID No 1, or a 
sequence complementary thereto. 

3. A purified or isolated nucleic acid comprising a polynucleotide selected from the group 
1 0 consisting of the nucleotide sequences of SEQ ID Nos 3 and 4, or fragments or variants thereof. 

4. A purified or isolated nucleic acid having at least 95% of nucleotide identity with any one of 
the nucleotide sequences of SEQ ID Nos 3 and 4, or fragments thereof. 

15 5. A purified or isolated nucleic acid according to claim 4, wherein this nucleic acid comprises 
the nucleotide sequence of SEQ ID No 6 

6. A purified or Isolated nucleic acid comprising the nucleotide sequence beginning at the 
nucleotide in position 171 and ending at the nucleotide in position 3728 of the polynucleotide of 

20 SEQ ID No 3. 

7. A purified or isolated nucleic acid comprising the nucleotide sequence of SEQ ID No 2, or a 
biologically active fragment or variant thereof. 

25 8, The nucleic acid of claim 7, wherein sajd nucleic acid further comprises a purified or isolated 
nucleic acid comprising a nucleotide sequence located between the nucleotide at position 1 and 
the nucleotide at position 170 of SEQ ID No 3, or a biologically active fragment or variant 
thereof. 

30 9. The nucleic acid of claim 7, wherein said nucleic acid further comprises a purified or isolated 
nucleic comprising a nucleotide sequence located between the nucleotide at position 1 and the 
nucleotide at position 175 of SEQ ID No 4, or a biologically active fragment or variant thereof. 

10. A purified or isolated nucleic acid comprising : 
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a) a nucleic acid comprising a regulatory nucleotide sequence of SEQ ID No 2, or a biologically 
active fragment or variant thereof; 

b) a polynucleotide encoding a desired polypeptide or nucleic acid, operably linked to the nucleic 
acid comprising a regulatory nucleotide sequence of SEQ ID No 2, or its biologically active 

5 fragment or variant; 

c) Optionally, a nucleic acid comprising a 3'-UTR regulatory polynucleotide, preferably a 3'-UTR 
polynucleotide of the TBC-1 gene. 

11. The nucleic acid of claim 10, wherein the polynucleotide of step b) encodes a sense or an 
10 antlsense RNA molecule. 

12. A purified or isolated nucleic acid useful as an amplification primer or as a probe, wherein 
this nucleic acid comprises a polynucleotide selected from the group consisting of the nucleotide 
sequences complementary to any sequence of a strand of the TBC-1 gene and the nucleotide 

15 sequences complementary to any one of the SEQ ID Nos 2-4, 7-8 and 13-14. 

13. A nucleic acid probe or primer comprising at least 20 consecutive nucleotides of the 
nucleotide sequence of SEQ ID No 1. 

20 14. A nucleic acid probe or primer comprising at least 8 consecutive nucleotides of a 
polynucleotide selected from the group consisting of : 

a) the nucleotide sequence beginning at the nucleotide in position 2077 and ending at the 
nucleotide in position 12290 of the nucleotide sequence of SEQ ID No 1 ; 

b) the nucleotide sequence beginning at the nucleotide in position 12373 and ending at the 
25 nucleotide in position 1 2738 of the nucleotide sequence of SEQ ID No 1 . 

15. A nucleic acid probe or primer comprising at least 8 consecutive nucleotides of a 
polynucleotide of SEQ ID No 2. 

30 16. A nucleic acid probe or primer comprising at least 20 consecutive nucleotides of a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 
and 4. 

17. A nucleic acid probe or primer selected -from the group consisting of the nucleotide 
35 sequences of SEQ ID Nos 9-14. 
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18, A purified or isolated nucleic acid comprising a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 7-8. 

5 19. A purified or isolated biallellc marker , wherein said biatlelic marker is from the sequence of 
the TBC-I gene. 

20. A purified or isolated biallelic marker according to claim 17. wherein said biallelic marker is 
associated with prostate cancer. 

10 

21- A purified or isolated nucleic acid selected from the group consisting of SEQ ID No 7-8 or a 
variant or fragment thereof, said fragment comprising at least 8 consecutive nucleotides of said 
nucleic acid and including the polymorphic base thereof. 

Ph 15 22. A method for the identification and characterization of a biallelic marker in the genomic 



region harboring the TBC-1 gene, said method comprising : 

- providing a plurality of primer sequences capable of amplifying portions of the genomic 
region containing the TBC-1 gene, and in particular portions of the polynucleotide of SEQ ID 
No 1; 

20 - amplifying portions of the genomic region containing the TBC-I gene from a plurality of 
individuals using said primers to obtain a plurality of amplicons; and 

- sequencing said plurality of amplicons to identify biallelic markers in the genomic region 
harboring the TBC-I gene. 

25 23. A method for the amplification of the TBC-I gene or a fragment or a variant thereof in a test 
sample, said method comprising the steps of : 

c) contacting a test sample suspected of containing the targeted TBC-1 gene sequence or 
portion thereof with amplification reaction reagents comprising a pair of amplification primers 
located on either side of the TBC-1 region to be amplified, and 
30 d) detecting the amplification products. 

24. The method according to claim 23, wherein the amplification primers are selected from the 
group consisting of SEQ ID Nos 9-10. 
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25. The method according to claim 23, wherein the amplification product is detected by 
hybridization with a labeled probe having a sequence which is complementary to a region of the 
TBC-1 gene. 

5 26. A kit for the amplification of a nucleotide sequence contained in the TBC-1 gene, wherein 
said kit comprises : 

a) A pair of oligonucleotide primers located on either side of the TBC-1 region to be amplified; 

b) Optionally, the reagents necessary for performing the amplification reaction. 

10 27. A method for detecting the presence of a nucleic acid comprising a nucleotide sequence 
selected from the group consisting of SEQ ID Nos 1-4 and 7-8 in a sample, said method 
comprising the following steps of : 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize to a nucleotide sequence included in one of the nucleic acids of SEQ ID Nos 1-4 and 

15 7-8 and the sample to be assayed; 

b) detecting the hybrid complex formed between the probe or the plurality of probes and the 
nucleic acid in the sample. 

28. The method of claim 27, wherein said nucieic acid probe or the plurality of nucleic acid 
20 probes is selected from the group consisting of SEQ ID Nos 9-10, 11-12 and 13-14. 

29. The method of claim 27, wherein said nucleic acid probe or the plurality of nucleic acid 
probes has been imobilized on a substrate. 

25 30. The method of claim 27, wherein said nucieic acid probe or the plurality of nucleic acid 
probes is labeled with a detectable molecule. 

31. A kit for detecting the presence of a nucleic acid comprising a nucleotide sequence selected 
from the group consisting of SEQ ID Nos 1-4 and 7-8 in a sample, said kit comprising : 

30 a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize to a nucleotide 
sequence included in one of the nucieic acids of SEQ ID Nos 1-4 and 7-8; 
b) optionally, the reagents necessary for performing the hybridization reaction 

32. The kit of claim 31, wherein said nucleic acid probe or said plurality of nucleic acid probes is 
35 selected from the group consisting of SEQ ID Nos 9-10, 1 1-12 and 13-14. 
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33. The kit of claim 31, wherein said nucleic acid probe or said plurality of nucleic acid probes is 
labeled with a detectable marker. 

5 34. The kit of claim 31, wherein said nucleic acid probe or said plurality of nucleic acid probes is 
immobilized on a substrate. 

35. An array of nucleic acid sequences comprising at least one of the sequences selected from 
the group consisting of SEQ ID Nos 7-8, 9-10, 11-12 and 13-14 or the sequences 

10 complementary thereto or a fragment thereof of at least 8 consecutive nucleotides thereof. 

36. An array of nucleic acid sequences comprising at least two of the sequences selected from the 
group consisting of SEQ ID Nos 7-8, 9-10, 11-12 and 13-14 or the sequences complementary 
thereto or a fragment thereof of at least 8 consecutive nucleotides thereof. 

15 

37. A method for the detection in an individual of alleles of TBC-1 associated with prostate cancer, 
wherein said method comprises the following steps : 

-a) obtaining a nucleic acid sample from the individual to be tested, and 

-b) determining the presence in the sample of an allele of a bialleiic marker or of a group of 
20 bialleiic markers of the TBC-1 gene which, when taken alone or in combination with 
another/other bialleiic marker/s of the TBC-1 gene, is indicative of prostate cancer, of an early 
onset of prostate cancer, of the level of aggressiveness of prostate cancer tumors, of a modified 
or forthcoming expression of the TBC-1 gene, of a modified or forthcoming production of the 
TBC-1 protein, or of the production of a modified TBC-1 protein. 

25 

38. The method of claim 37, wherein the bialleiic marker comprises at least one of the 
sequences of SEQ ID Nos 7-8, or a fragment thereof including the polymorphic base, 

39. A diagnostic kit for determining the presence in a DNA sample of alleles associated with 
30 prostate cancer, with an eariy onset of prostate cancer, with the level of aggressiveness of 

prostate cancer tumors, with a modified expression of the TBC-1 gene, with a modified 
production of the TBC-1 protein, or with the production of a modified TBC-1 protein, wherein 
said kit comprises primers for the amplification of the TBC-1 gene. 
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40. A kit according to claim 39, wherein at least one of the primers is complementary to a 
nucleotide sequence of the TBC-I gene comprising an allele of a biallelic marker associated 
with prostate cancer. 

5 41. A kit according to claim 39, wherein the primers comprise a nucleic acid selected from the 
group consisting of the nucleotide sequences of SEQ ID Nos 9-14. 

42. A method for the treatment of prostate cancer comprising the following steps : 

- selecting an individual whose DNA comprises an allele of a biallelic marker or of a group of 
10 biallelic markers, preferably markers of the TB01 gene, associated with a susceptibility to 

prostate cancer, 

- monitoring in said individual the appearance and optionally the development of a tumor in the 
prostate; and 

- administering an effective amount of a medicament acting against prostate cancer to said 
;^ 15 individual at an appropriate stage of the prostate cancer. 

43. The method of claim 42, wherein the biallelic marker comprises at least one of the 
7 1 sequences of SEQ ID Nos 7-8. 

;i 20 44. A recombinant vector comprising a nucleic acid encoding the TBC-1 polypeptide of the 
C; amino acid sequence of SEQ ID No 5, or a peptide fragment or variant thereof. 

45. A recombinant vector comprising a nucleic acid according any one of claims 1 to 1 1. 

25 46. A recombinant expression vector comprising a nucleic acid selected from the group 
consisting of SEQ ID No 2, or a biologfkally active fragment or variant thereof, 

47. A recombinant expression vector comprising : 

a) a nucleic acid comprising a regulatory polynucleotide of the nucleotide sequence of SEQ ID 
30 No 2, or a biologically active fragment or variant thereof; 

b) a polynucleotide encoding a polypeptide or a polynucleotide of interest which is operably 
linked with said nucleic acid; 

c) optionally, a nucleic acid comprising a 3'-regulatory polynucleotide, preferably a 3'-regu!atory 
polynucleotide of the TB01 gene, or a biologically active fragment or variant thereof. 
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48. A recombinant vector useful for the expression of the TBC-I coding sequence, wherein said 
vector comprises a nucleic acid selected from the group of SEQ ID Nos 1, 3 and 4 or a nucleic 
acid having at least 95% nucleotide identity with a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 1. 3 and 4. 

5 

49, A recombinant expression vector comprising a nucleic acid comprising the nucleotide 
sequence beginning at the nucleotide in position 176 and ending in position 3733 of the 
polynucleotide of SEQ ID No 4. 

10 50. A recombinant host cell comprising a nucleic acid coding for the TBC-1 polypeptide of SEQ 
ID No 5. 

51. A recombinant host cell comprising a purified or isolated nucleic acid encoding a TBC-1 
polypeptide, or a polypeptide fragment or variant thereof. 

15 

52. A recombinant host cell comprising a purified or isolated nucleic comprising at least 20 
consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide 
sequences of SEQ ID Nos 3 and 4. 

20 53. A recombinant host cell comprising a purified or isolated nucleic acid comprising the nucleotide 
sequence SEQ ID No 2 or a biologically active fragmeni or variant of the nuclectide sequence of 
SEQ ID No 2. 

54. A recombinant host cell comprising a purified or isolated nucleic acid comprising a 3 - 
25 regulatory sequence of the TBC-1 gene, or a biJlogically active fragment or variant thereof. 

55. A recombinant host cell comprising a polynucleotide consisting of : 

(1) a nucleic acid comprising a regulatory polynucleotide of SEQ ID No 2 or a biologically active 
fragment or variant thereof; 
30 (2) a polynucleotide encoding a desired polypeptide or nucleic acid. 

(3) Optionally, a nucleic acid comprising a 3'-regulatory sequence, preferably a 3'-regulatory 
sequence of the TBC-I gene, or a biologically active fragment or variant thereof, wherein 
sequences (1). (2) and (3) are operably linked one with each other. 
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56. A recombinant host cell comprising a recombinant vector according to any one of claims 44 to 
49. 

57. A transgenic animal comprising in its somatic cells and/or its germ line cells a polynucleotide 
5 selected from the following group of polynucleotides : 

a) a purified or isolated nucleic acid encoding a TBC-1 polypeptide, or a polypeptide fragment 
or variant thereof; 

b) a purified or isolated nucleic comprising at least 20 consecutive nucleotides of a 
polynucleotide selected from the group consisting of the nucleotide sequences of SEQ ID Nos 3 

10 and 4; 

c) a purified or isolated nucleic acid comprising the nucleotide sequence SEQ ID No 2 or a 
biologically active fragment or variant of the nucleotide sequence of SEQ ID No 2; 

d) a purified or isolated nucleic acid comprising a 3'-regulatory sequence, preferably a 3'- 
regulatory sequence of the ^BC-^ gene, or a biologically active fragment or variant thereof; 

15 e) a polynucleotide consisting of : 

(1) a nucleic acid comprising a regulatory polynucleotide of SEQ ID No 2 or a biologically active 
fragment or variant thereof; 

(2) a polynucleotide encoding a desired polypeptide or nucleic acid; 

(3) Optionally, a nucleic acid comprising a 3'-regulatory sequence, preferably a 3'-regulatory 
20 sequence of the JBC-^ gene, or a biologically active fragment or variant thereof, wherein 

sequences (1), (2) and (3) are operably linked one with each other. 

58. A method for producing a polypeptide comprising the amino acid sequence of SEQ ID No 5 or 
a fragment or a variant thereof, wherein said method comprises the steps of : 

25 a) culturing, in an appropriate culture medium, a cell host previously transformed or transfected 
with the recombinant vector of step b); 

b) harvesting the culture medium thus conditioned or lyse the cell host, for example by sonication 
or by an osmotic shock; 

c) separating or purifying, from the said culture medium, or from the pellet of the resultant host cell 
30 lysate the thus produced polypeptide of interest. 

d) Optionally characterizing the produced polypeptide of interest 

59. A TBC-1 polypeptide comprising an amino acid sequence of SEQ ID No 5, or a fragment or 
variant thereof. 

35 
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60. The polypeptide of claim 59, wherein said polypeptide comprises the TBC domain of TBC-1. 

61. The polypeptide of claim 59, wherein said polypeptide comprises the amino acid sequence 
EVGYCQGL. 

5 

62. A polypeptide comprising amino acid changes ranging from 1, 2, 3. 4, 5, 10 to 20 
substitutions, additions or deletions of one amino acid as regards to the TBC-1 polypeptide of 
the amino acid sequence of SEQ ID No 5. 

10 63. A polyclonal or a monoclonal antibody, a humanized antibody or a single chain Fv thereof, 
directed against the TBC-1 polypeptide of the amino acid sequence of SEQ ID No 5. or to a 
peptide fragment or variant thereof. 

64. A method for detecting specifically the presence of a TBC-1 polypeptide in a biological sample, 
1 5 said method comprising the following steps : 

a) bringing into contact the biological sample wrth an antibody directed against the TBC-1 
polypeptide of the amino acid sequence of SEQ ID No 5. or to a peptide fragment.or variant 
thereof, 

b) detecting the antigen-antibody complex formed. 

20 

65. A diagnostic kit for detecting in vitro the presence of a TBC-1 polypeptide in a biological 
sample, wherein said kit comprises : 

a) a polyclonal or monoclonal antibody directed against the TBC-1 polypeptide of the amino acid 
sequence of SEQ ID No 5, or to a peptide fragment or variant thereof, optionally labeled; 
25 b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent 
carrying optionally a label, or being able to be recognized itself by a labeled reagent, more 
particulariy in the case when the monoclonal or>olyclonal antibody is not itself labeled. 

66. A method for the screening of candidate substances for prostate cancer treatment, wherein 
30 said method comprises the following steps: 

a) providing a cell line, an organ, or a mammal expressing a TBC^1 gene or a fragment thereof, 
preferably the regulatory region or the promoter region of the TB01 gene. 

b) obtaining a candidate substance; 

c) testing the ability of the candidate substance to decrease the symptoms of prostate cancer 
35 and/or to modulate the expression levels of TBC-1, 
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67. A method for the screening of a candidate substance interacting with the TBC-1 
polypeptide, wherein said method comprises the following steps : 

a) providing a polypeptide comprising the amino acid sequence SEQ ID No 5, or a peptide 
5 fragment or a variant thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

d) detecting the complexes fornied between said polypeptide and said candidate substance. 

10 68. A kit for the screening of a candidate substance interacting with the TBC-1 polypeptide, 
wherein said kit comprises : 

a) a TBC-1 protein having the amino acid sequence of SEQ ID No 5 or a peptide fragment or a 
variant thereof ; 

b) optionally means useful to detect the complex formed between the TBC-1 protein or its 
15 peptide fragment or variant and the candidate substance. 

69. The kit of claim 68 wherein the detection means consist in monoclonal or polyclonal 
antibodies directed against the TBC-1 protein or a peptide fragment or a variant thereof. 

20 70. A method for the screening of a candidate substance or molecule that modulates the 
expression of the TBC-I gene, wherein this method comprises the following steps : 

a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
comprises the nucleotide sequence of SEQ ID No 2 or a biologically active fragment or 
variant thereof, the nucleotide sequence of SEQ ID No 2 or its biologically active fragment or 

25 variant being operably linked to a polynucleotide encoding a detectable protein; 

b) obtaining a candidate substance, and 

c) detennining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

30 71. The screening method of claim 70, wherein the nucleic acid comprising the nucleotide 
sequence of SEQ ID No 2 or a biologically active fragment or variant thereof also includes a 
5'UTR region of one of the TBC-I cDNAs of SEQ ID Nos 3 and 4, or one of their biologically 
active fragments or variants thereof. 
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72. A kit useful for the screening of a candidate substance or molecule that modulates the 
expression of the TBC-1 gene, wherein said kit comprises : 

a) a recombinant vector that allows the expression of a nucleic acid comprising a nucleotide 
sequence of SEQ ID No 2 or a bioiogically active fragment or variant thereof; 
5 b) a polynucleotide encoding a detectable protein which is operably linked to the nucleotide 
sequence of SEQ ID No 2 or a biologically active fragment or variant thereof. 

73. A method for the screening of a candidate substance or molecule that modulates the 
expression of the TBC-I gene, wherein siad method comprises the following steps : 

10 a) providing a recombinant host cell containing a nucleic acid, wherein said nucleic acid 
comprises a 5'UTR sequence of one of the TBC-I cDNAs of SEQ ID Nos 3 and 4, or one of 
their biologically active fragments or variants, the 5'UTR sequence or its biologically active 
fragment or variant being operably linked to a polynucleotide encoding a detectable protein; 
J b) obtaining a candidate substance, and; 

5; 15 c) determining the ability of the candidate substance to modulate the expression levels of the 
polynucleotide encoding the detectable protein. 

74. The method of claim 73, wherein the nucleic acid that comprises a nucleotide sequence 
selected from the group consisting of the 5'UTR sequence of one of the TB01 cDNAs of SEQ 

)^ 20 ID Nos 3 and 4 or one of their biologically active fragments or variants, includes a promoter 
3, sequence which is endogenous with respect to the TBC-1 5'UTR sequences. 

13 75. The method of claim 73, wherein the nucleic acid that comprises a nucleotide sequence 

selected from the group consisting of the 5'UTR sequence of one of the TBC-1 cDNAs of SEQ 
25 ID Nos 3 and 4 or one of their biologically active fragments or variants, includes a promoter 
sequence which is exogenous with respect to the TBC-1 5'UTR sequences defined therein. 

76, A kit for this screening of a candidate substance modulating the expression of the TBC-I gene, 

wherein said kit comprises : 
30 a) a recombinant vector that comprises a nucleic acid including a 5'UTR sequence of one of the 
TBC-I cDNAs of SEQ ID Nos 3 and 4, or one of their biologically active fragments or variants, 
the 5'UTR sequence or its biologically active fragment or variant being operably linked to a 
polynucleotide encoding a detectable protein. 
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'A'. 



GGACAGTATC TAGCACAATA CCCCAAATCG ACTAACTCCT CCGTAAAGAA TAGCTACCAC 60 

TATTGTGAGA GTTTTAAGTC AAGCTGTGAA TAAAACTCTT GGGTCCACTT AAAAATACCT 120 

CCCCTGGATG TAAGCATCCA GGGAAATCAG GGAATGCCAT AAGACAGCCC TAATCTAAAA 180 

GCCTACAAGA AGCTCAGTGG GCTTCAAGGA AGACACTGCT CTTGGTACGA TGAGGAAACC 240 

TGGCCCTCTA TTTGCCTCCT GGGCCACAGT AATATTGATA ATAGCTGCTG CTTTTAGTTG 300 

I 

AGGACCATGT ACGTCTGTGT CACTGCACTG GCCACTTTAC TTACACTTTC CTGCTTTGTC 3 60 

CTCACAAAGA TCCTGTAAGG TGTGTATTGG TCCCATTTAG CAGGTAAGAC AATGAAGACC 420 

AGAGGTCCAG CACCTTGCCT AAACCACACC TGCTGGGATT TGGATTCAAG TCCAACCGTA 480 

CAGCTCAAAC GCTCAGCCAC TTCCCTAAAG TCCACCCCCA GCTACATTAA GTAAAAAAAT 54 0 

CCAGAAAGAT GCCACCTGGG GGTCTGGAAC TGCCTCCTCC GAGCACCCGG CTCTCCCCTC 600 

CCTGCGGACT CTTCTCTGGA GAGGATGTGA TGCTTCTTAC TTTTCTCAGA TCCCTCTCCC 660 

CACCCTGCGA GTGACGTTGC GCCTCTGTGC CTGGTGGGAT AGGGATCTGG GAGCTTCGCC 720 

TGTTTTTTGC ACACTGCCAT CCCCTAGTCT TAGGGAGCGA GCTCTGTCCC GCTTTTCACA 780 



2 



TCTCCGCGTC TTTCCTTGCA CTCTACATCA CCGCTGGGAA TGTCCCCAGA CCTGATCGGG 840 



900 



GCATGCACAC TGGGGTGTGC GTGTGCGTGT GGTGTGTGTT CCTGCGCGTG TGCCGGGCTC 
GCGGGGCAGG AAAAAGCGCC TAATCCAGGC TCTGCGTCAC TCCCGCAATT GGTTAGAAAT 960 
GGAGTTTCCT GGTGTTTAAT CCCGGGAGGG CACTTCGCCT TCGTTGTTTC CCAGAGTCCC 1020 
TGATTTTCCT GCCTCGCATG CCAGCGCCCC ATAGGGCATC CGTGCCTCAG TTCACCTCTT 1080 
GCCATCCTCC AAGGACGGGG AGAAGGGGTA AGGCGGGGGA GAGCAAGGTG GCTTGGTCGC 
CCCCGGCCCC CGCCCCCCAT GTTGTGTGCA GTTTCCACCA CGTCTGTTTC GGAGGGAGAA 
GAGGAGGGTG CAGATGAGGC GAGGCGCCTT CGGGAGCGCG GAGAGCGGGC AGGCAGTGCC 1260 
ACCTGCTGAG AGCCACTCAG GCCGAGCAAG CGGCGGGCAG TGCCACCTGC TATAAATAGG 
in CCGCCAAGGA CAGGGTGTGC GACTGTACAT CCCGCCACGA GGGCCTGCAT CACGCGCGGG 
a GCCCCGCGCC CCCGGCTCCC CAGGGAAACG CTGTGCCCAG ATCCTGCGCA GGGGTCTGGA 
if- TGGGGCGGCG GCCCGAGTAC TTCCCCCCTA TTCCCCCCAC ANACACTGGC TGAGGATGGC 
S CCGCGGGCTT GGGGGCGGGG GGTGGCAAGG AGGGGAGGGA GGCCGCGGCG GACCCGCAGT 
GCAGCAGCTG TTGCTCGCGT GTGACTCGCC CGTCCGGGCC GTGCTGCCCA GGCACAGTCA 
CACGGCGCAG TGGGGAGGAG GAGGACACCG AGTCCCCCTC CCAGCTCCCC GGGGACCGAG 
TGGGGAGATC CCGGCTCCTG TCTTCCCCTC GCCTCCAGCG CGCTCGCCCA GGCTGGGAGG 
AGGAAACCAG AGCCGCGCGC AGACACCTCC TCCTTCTCCT CCTCTTCTTC CTCCTCCTCC 
TCCTCCTCCT CCTCCTCTTC GGCTGCTGCT CCTGGTGCCG CCACCGTCCG CCGGTGCCTG 
TTGCTGCCGC CGCCGCGGGA CCTGCTGTGT CCTCAGCTGG GTGGAgAaGA GGCGGGCGCC 
GAGCCGAGGG GAGCCCCCTC CCCGTCCCCC CGCGGCGGGA AGAGCGCAGC CAGCCGGGTG 
CGATGGACTC CCCGCCCGCC CAGGCCGTCC CCAGGATGCC CCCAAGCACC TGCGCGTCCC 
GGCCCGGCCC CGGGCTCTGA GCGCGCCGCG GCACAGGTAA GGCGCTTCCT GGGGCTTCGT 
CCTGGCCACC CTGCTGGCTC CTCTCGGGGC GTCGCGGCCG CCCCCTCCCG CAGCACGCCC 
CTGCCCCGCC TGGCCGCGGA GGGGAAGGCA TCTGGCCGCC CACGGACGCG AGGCCAGGGT 
CTCTCGGGGG AGGAAGTTCA TTGCCATCTC GTTGCCCCCC TTACCCCCCC ACCCCCGCCG 
CCCTTGGACG AAAGCGAAAC CTTAATGTTG CTAGCGACCC GAGAGCTCCG CCGGCTTCTC 
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CCCCAACCCC CGCCAGCTCA CTGGTCCGCG CATCTCTCCC CTCCCCCCTC CCGCCAATTA 2400 

TCCTAGCGTG TTTGCAAGGC GACCAGATTG GAAAGAGTGT GGTCAGAGTG ACCCCAAGCC 2460 

ACGCTTTAAA AGTTCAGGGT ACTTTGCAGT AGTAACTTTG GCAGCTCCAC CAGTGCGCGC 2520 

AACATTTCTT TCTATGGGTA CATCCTGTAC CAGTCATTTT GAAACCCTGC TTCATTGTTT 2580 

CTAGCCGCTT CCTGATGGCT CTGTGATTAT GAGACCCCCC TCAAACTTCA CCAGGCATTA 2640 

AGGTTTTGTT TTTGCTTTTT TTTCAGAGAG GTATCATTTC GTTTGAAATC CACCTAGATG 2700 

TGGCTTTTCC TGTTTTGATT TTACTTAACA TAGCTTATTC TCTGGAAGTT GCTTTAAAAA 2760 

GAAATTGAAA GTGATGGTTG TTCCTTCCAC CAAACAGTTT AATTTTCAGG GTGCCTCATA 2820 

TTAATGGATA TGTTTTCCCT TCATAGATTT CTCATTGTTT CCCTTATGAT GGGATGATTT 2880 

CATTTATTAA TAAAATCAGA CTTTGAAAGA GCATTTAAAA ATGACCTGGT TTAAATAGGT 294 0 

S: CACACCCAAG AAACTCAGCT ATCTGTACAA GTTCAAACTT CTAAACTTTT TCAATGAGCT 3000 

■1'. AGGGGTGGTG GCACCCACCT GTAGTCCCAG CTACTTGGGA GGCTGAGGCA GGAGGATCAC 3060 

K TTGAGCCCAG GAGTTCGAGG CCATAGTGAG CTATGACTGT GCCACCTCAC TGGAGCCTGG 3120 

ilii GTGACAAAGT GAGATCCCAT CTCTTAAAAA AAAAGAGTTT AGGGGACATT TTCTGAAGTG 3180 

Q AACACAAGTA GAGCATTCTA ACACTATTGA GTGCAAGGAG ACCTGGAAGG GACTAAGTGG 3240 

'B TTCAAAGCAG GAAATAAAAT CATCAGGTGA TAATTAAAAT AATTTCTTTC CTGTGGATTT 3300 

S: GTCCAGCCAT TTGCAAACCA GGAGAATAGG AAAAAAAATC ACTAGTGTAG TTATAAATTA 33 60 

TTACATTACG TTTTCAAAGG AAAATTTTGC AAATGCGTCT CCTTGTCATA GTCTATTGTT 3420 
ATCTACCCCA CTGAGAGTGC TGGGGCTTCC CCTTTTCACC ACGACAGCAT TTCTGGTTGG 34 80 
GTGGCAGTCA TGCAGTGTTG ACCTGGTGTC CCATAAGGCA CAGTTTGTCA AAACACTAGT 3540 
GGGTATTAGG AGGAAACGTG CAACTCTGAA GCAACAGAGC TTGCCCCTTC TTCCTCATTA 3600 
TCCAGCTGGT GATAATCCCT GTCCCCCACT TCCCTAGAAG ACAGCTTTGA CCAGGAAGGC 3660 
TGCAATGACA ATGAGATGTA CCCCTATGCA GAGCCAGATG TGGGCGGGTG GCTTTTTTGT 37 20 
GGTCCAGATC TTCTAGGATC TTCTAGGATG TAACCCTGGC AAGCAGTGGG GAGCCTGAAT 37 80 
CAAGCAGCAT GGCTGTTACC TCTTCTGTGT TCACAGCAGC ATCTTCAGTT GTCTTGGTGC 38 40 
CTGGAGCAGG CACCACAGCT GCCTGCTCTG TTGGCCACCA GCTTTCTAGA GTAGATGGTA 3900 
GGGAGGAGAG CAAGGGGCTC AAGAGGATTC TGTCTTTGAA CATGCTTTTA ANTTTGATCT 3960 



4140 
4200 
4260 
4320 



GACAGAATGG CAGCTCCCTG AAGTCCTTCC TACTCTCTCC ACAGCATTTC TCTGTAGGTC 4020 
CCCAGTTTTT GCTCTTTTCA GATTCCCAGA GGACNTGAAA ATGTATCACG GCCCATTTGG 4 03 0 
GGACTTCCTG TATATGTGTG GGTGCCTCAG GATCATTTGT TTTGCCCTTT TCCAGTCTAC 
CGTGCTGCCC TTCTCAAGTT TAATGNACCA CGTTAGTTTC AATATTTTAT ATATTTCTCA 
GCAGTTTTCA TCTCTTGGTC ATTAAACTTG AGAAGTAAAA TCTGCTCATT AAAATGACTG 
AGTCCATGGC CAGGCATGGT GGCTCATGCC TGTAATCCCA GCACTTTGGG AGTCCAAGGC 
GGGTGGATCA CTTGAGGTCA GGAGTTCGAG ACCAGCCTGG CCAGCATGGC AAAACCCTGT 4380 
CTCTACAAAA ATATAGATCT ACAAAAACTA GCCAGGCATG GTGGCATGTG CCTGTAGTCC 4440 
CAGCTATTTG GGAGGCTGAG ACAGGAGAAT CGCTTGAAGC CAGGAGGCGG AGGTTGCAGT 4500 
m GAAACATGAT CGTGCCACTG AGTCCATTCA GCAGCAGAGT AGTGTTGGGG TTTGTATCCC 45 60 
i TGTAGTGATG ACGAAGGNTT TAGGTTTTCA GTCAGAACTG TTACCTTACA ATTTCCTTCA 
S CTGACTTTTC TTCCTTTCCA ACACCACATT CCAATAAAAA ATATCTTTAG ACCAGATTCT 
m TCACGAAAGA CATGAAGGTT TTCATGCTTC AAGGTTTTTG ACTTTTTTTT TTTTTTTNAA 
AGGAGTCTTG CTGTGTCACC CAGGCTGGAG TGCAGTGGCG TGATCTCAGC TCACTGCAAC 
«: CTCCGCCTCC TGGGTTCAAG TGATTCTCCT GCCTCAGCCT CCCAAGTAGC TGGGACTACA 
'4 GGCGTGCTCT ACCACGGCCG GCTAAATTTT GTGTTTTTAG TAGAGGCGAG GTTTCACCAT 
m CTTGGCCAGG CTGGTCTTGA ACTCCCGACC TTGTGATCCA CCCGCCTTGG CCTCCCAAAG 

TGCTGGGATT ACAGGTGTGA GCCACGGCGC CCGACCAGTT TTTGACATTT CTAAGCCAAA 5040 
AGTTCCATTT GATGAGGTCT TAGATGCAGG GGCAATGTGT CCCTTTTCAG ATTTCAGATG 5100 
TTTAGAAAAA GATGTGTCAT ATTTGGGCCA ACTGAAAAAC TCTTGATATG TAGGTTTTTA 
TGAAGCTGTG CAGAATGTAG GAAATACATT TTAGAACCAA CAAAGAGGCA TTTAATTTTG 
AGTGTGCCTG TCTCCTTTGA GATGAGCAAC AGCTATTTTT CTCTTCAAAA GACAATGCGT 5280 
GTATTTATCA GCACATTTTA TATAATCAGC AAATCTAAAC CTCTGAATTA GGTAAGCCCT 
ATAGGTTTGT TGCCAGAATT AGTGAATTTA TACATGCAAA GTGCTTAGAA CAGTGCCTGG 
TACACAGTGA GCACTCAATA TTATTTATTG CTATTATTAT GTTTATTTAT TTTATACTTT 
TAGAGTATAA TTTTGATGTT AGGTTTGGAT TGCTGAGGCC AAGCAAAATT TAGATAGACC 5520 



4620 
4680 
4740 
4800 
4860 
4920 
4980 



5160 
5220 



5340 
5400 
5460 
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AACCCAGCTA 


ATCCACTAGA 


AAGATATTTG 


AGGGTTATTC 


CCATCTAAAG 


ATCTATGGGA 


5580 


TCTTTGGATA 


TCATCTGTGA 


GAAACAACAG 


AAGTTTGTAG 


ATAAGACAGA 


TATAGATTCA 


5640 


AATGCCACCT 


TCACAATTTA 


TTAGTGAGGG 


AACCCTTTGG 


TAAAATGAGC 


ATGACAAAAC 


5700 


CTACTTCTCA 


GCTTTGTGAA 


CGTACAGAAG 


ATCATGAATG 


TAAAATGTCC 


ATGAAGTGGT 


5760 


AGGTGGTCAA 


CAGATATTGC 


TCTAGCAAAG 


TGGTTAAGAG 


CAAGCAAACT 


CTGGAGCCAA 


5820 


AGGGCCTGGG 


TTCAAATCCC 


GATTCTGCCA 


CTTCTTATAG 


TATGGCCTTG 


GGCAGGTGAC 


5880 


CTCACTTTTC 


TGTGCCACTA 


TTCAATGATA 


ATATTCCTTT 


ATTGTCCAAC 


GTTTTGTAAG 


5940 


TTAAATCAGT 


TAATAAACAC 


ACTATGATAA 


TGTGTTGGTA 


ACTATTCTTT 


TTACTTTTAG 


6000 


CAGAATAACT 


TAAAGGAACA 


CTGCAGGTAG 


GAGGGTTATA 


CATAATCTCT 


GAGGGCCAGC 


6060 


TAGGACACTC 


GCCCATTTCC 


CACCCTTTTT 


CCTGTGCAAT 


GAAGAGTATA 


AGAAGTGACA 


6120 


GTGCCATTCT 


AAAGGACTAG 


CCTTGAGTTG 


GCTCTAATTT 


ATATGACTCG 


TGCCTGTAAT 


6180 


CCCAGCACTT 


TGGGAGGCCG 


AGGCTGGCAG 


ATCACNTTGA 


GGTCAGGAGT 


TGGAGACCAG 


6240 


CCTNGGCAAA 


CATGACGAAA 


TCTCATCTCT 


ACCAAAAATA 


CAAAGATCAG 


CCGGGCATGG 


6300 


TAGCACNCAC 


CTGT7\ATCCC 


AGCTTCTGGG 


GAGGCTGAGG 


CAGAAGAATT 


GTTTGAACCC 


6360 


AnnGGGCGGA 


GGATGCAGTG 


AGCTGAGATC 


ACGCCACTGC 


ACTACAGCCT 


GGGTGACAGA 


6420 


GTGAGATTCT 


GTCTCAAATA 


ATAATAATAA 


TTTATATGAG 


AAAGAAGTCA 


TTCAAAAGCA 


6480 


TCATTACTTT 


ACATGTCAAA 


TTAGAAAGGC 


ACACCCCAGT 


ACTAAAGCAT 


CCTTGATGAT 


6540 


GAAAACATTT 


AGAACCACTG 


GTTTCAGGAG 


CTCCATGCAA 


TGGTGAAACA 


GCCTCTACTC 


6600 


GAAGGGTTGT 


TGCTCCCTGT 


GGGATTCCTG 


GGTGAGGAAC 


ACACTGCTCC 


CGTTGGGGTG 


6660 


riAATCCTGTG 


GAGGTLAGTGG 


ATGAAGAGTG 


TAGCCAAGTC 


AGTGAGCCTA 


CTGCATGGGA 


6720 


TTAGAGTACT 


TCAGGTTGCA 


GTATAATTCT 


GTTCAGGTGC 


ATGCTCACTC 


CATCTGGCGT 


6780 


AAAGAACAGA 


GAAATTAAAC 


CATTGATTCA 


CAGAGCAATA 


TGAGTAGCTG 


CCTGGGGACT 


6840 


TCCTGTCCAT 


TCTGGGGTNC 


CCAACAGCCA 


ATCAATATTG 


GCCGGTTCCT 


AATCTGACCT 


6900 


AGAGCTAGAG 


GCCACTAGCA 








CCCACTCCCA 


6960 


CCAGCATTCT 


GAGGAGGTGA 


GGGCTGAAGC 


TGCAGAGGCT 


GTTGTACTGT 


CAGTTACTGT 


7020 


AAAGTCCCCA 


TCCTGCTCTC 


TGTAGTTTGC 


TGTGAAGGAG 


TGGAGGGGGC 


TGGGAACAAA 


7080 


GGGGGTTCCC 


AATAAGCAGC 


TTACTAATAC 


CTACCCTTGC 


TTCTCTCACT 


TCCTGATCAA 


7140 





TAATTCCCAT 


CCTTTCTTCA 


TGTACCTCCC 


CCACATTTTT 


GTTCTTTAAG 


GGAAGAAGGG 


7200 




AGCAGCTTTC 


CATCCCCACA 


CAATGTTGGG 


GACATTTGGT 


ATTATACATT 


ATGAAAATAA 


7260 




GAT T TAG C AG 


ATTTTAGAGA 


TGGAAGAAAA 


CTTGGGGATG 


ATCTTGTTCC 


ATTCTCTTAT 


7320 




AAGAACAAAC 


AATATTTGGA 


GAAGCTGAGT 


AACTTGCTTT 


TTCAACTTCA 


CACTTGAAAG 


7380 




TGTTTTCATG 


AGGAAGTTGG 


GGTTCTCTGC 


AGCACTTGGA 


TGGGAGTCAG 


GGACTTGGAT 


7440 




TGTCCCAATT 


CTGTCACTAA 


ATTTGGACAA 


GCCACTTAAC 


TTTTCCAGAA 


TCTAGTTGCC 


7500 




TCATTTCAAA 


TATTAGGGAA 


ATTTCTAAAT 


GGCTTAAAAG 


GAGCTTGTTA 


GCTTTAAAAT 


7560 




TTTATGATTC 


TAAGTGTATG 


CTGCCAGAGA 


TATGTAGCAT 


AGCAGGACAC 


ATTAACAAGA 


7620 




TTATTGAAAC 


TGTTCTAATA 


AAGGACATCT 


TTGTGTCTTG 


GGTAGCTACT 


ATGTTTAAAG 


7680 




ACTGTGCTAG 


GTGGGAGTTG 


TGCAGAATAC 


ACAGGTTTGC 


TGTAGAGGGA 


TAGGGCGTGT 


7740 




ACACAGACAA 


CTCTACTAAC 


AAGAACGTTA 


CTAGAAGCTT 


ATTGGAATCA 


CAGTATTTCT 


7800 


$ 


TGCTGAGGGT 


ATGAAACATA 


AGAGTTCTCC 


TTGGAATATG 


AGGTTCTATT 


TGGGGCTTAA 


7860 




AGAATGGTCA 


AAGGTTGAGT 


GCAAATAACA 


TGGATTGAGA 


TGGCTTTAAA 


AAATAATCAA 


7920 




ATGGTTTGTT 


AGTATTAAAC 


TGGTGCAGAA 


ATAATTGCAG 


TTTTTGCCAT 


TCCTTTTAAT 


7980 




GGCAAAAATT 


GCAGTTACTT 


TTAAACCAAA 


TCCCTAATAT 


TATTTGCATA 


GTTTATCTCT 


8040 




GTTATGGAAG 


TTTTTATTGA 


CAAGTAATGT 


AGATATTCAC 


CTGATCTAAG 


TTACCCTGAA 


8100 




TCTTATATTA 


GCAGAATCTG 


AATTGCTTAT 


AAATAATTAT 


GGCTATGTTG 


GATGTAGAAC 


8160 




TTATTATTTG 


ATAGTTTATG 


AACAGTGCTA 


AGGTCTAATC 


TACTTTTTAC 


AGAGAAGCTA 


8220 




AGAACATGCT 


ACAGCTGGTT 


GAAAAACAAA 


AACTTCAGGC 


ATTGAAATGT 


TTTGTCAATG 


8280 




AAATGGCAGG 


ACTCATTTGA 


TGACTGATTA 


TTATCAACTG 


ATTTAAATGA 


CTGAATTTTT 


8340 




GGTACTGTGT 


AC AT CT AT AC 


TCTAAGAAGG 


AAATTGAAAG 


TAATTCTGCT 


ATGCTTGTTG 


8400 




CCACTATATT 


AATAACTGCA 


TCATCTAAAA 


TAATTGATAG 


AGCTCAGATT 


TATCCTTTGT 


8460 




AATAATTCTA 


GTACTTCTTT 


AAACATGTTT 


TGGGATTAGC 


AGCTGTCAAC 


AGTTAGAACA 


8520 




TGAAACAGAT 


TCTGTTACAG 


GAGTAGAAGT 


CGATCCAGAC 


ATTTAATGTC 


ATTTTCACCT 


8580 




GTGAGAGAGA 


, GAATAAAGAG 


AAAGAGAGAT 


CATTATTTAT 


GGGATTATGT 


GAACTTCAAG 


8640 




TCCGTTTTCA 


, TTATTAGGAG 


AAGCTGTGCT 


TTAAAGGACA 


GTCAGGGACT 


TTACTTTCAT 


8700 
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GAAATGCCTG 


AGCTGTAAAT 


AAAGTATTGC 


TTTATTTTTT 


ATTTCTTGAA 


CATTTGAAAT 


8760 




AAAAAATTAG 


CTATGAGTTA 


TGTTCAAATT 


ATATTATAAA 


AATTTGCTCT 


TAGCATTGTG 


8820 




CATATATATT 


ATACAGAAAA 


ACACACAGTA 


AAAAGAATAG 


ACTTCAGTTC 


CTGTTCAGAA 


8880 




AAGGTTTAAA 


ATTTGAATAC 


TGATTTTGGA 


AACCCCAAAC 


CTTAAGAATT 


CAAGAAGCTT 


8940 




ACGGTCTTCT 


TGAGGGACAC 


CTATTCAAAC 


TCTTAAATAT 


GGTGATTGGG 


TAGAAAGTGC 


9000 




AGAAAAGCCT 


GCTGATACAT 


GCCCTAAAAC 


ACCTTGGAAA 


AAAGAGGTGG 


TAGTTGCTTG 


9060 




AGGTAGGACT 


TAAGTACTAG 


TTGGAAATAG 


AAGACAAGGA 


TGGAGACTGT 


TGGTAGATGA 


9120 




CTCTCCATGG 


GTCCTTCCTG 


TTTCTACACA 


CCTTGTAAGC 


AGGGCATTGA 


GTGCCTGTGT 


9180 




TCCAAACTAC 


CTTTTCCATC 


ATGTTTCTAC 


AGCAAACAGT 


CATGGAAGAT 


AGAAATAGAG 


9240 




TCTTCCTCTG 


GAGCAAAGGG 


CAGACACGCT 


TGCTTCCTGT 


ACTTCCCACT 


ATAAGATATT 


9300 




CCGGCTCCCT 


AAACTCAGCT 


GCCTTTCCTG 


TAACCCACCA 


TGATACAGAT 


GTCACCTGAC 


9360 


— 


CTGTGGGAAT 


TGGGGGTCAG 


GGAACCAAGA 


GAAATGCTGA 


CTGTCTGGCT 


ACTGTGACTG 


9420 


y ii 


CCCTGAGTAA 


TAAATTGTCC 


TTCGTCTCCA 


ACCCAGGAGT 


CTCATGTTTT 


CTACCAGCAG 


9480 


'V •: 


GATAACTGTG 


GCGGGCTAAC 


GTGTTAGTTT 


GCAAGTAAGG 


TAAAATCTCA 


GACCCTTTGC 


9540 




AGTTTGTGGC 


AGGGATTATA 


TTCTGACGAG 


AGAGGAACCG 


TATGCACCAT 


GGCTCAGAGG 


9600 


\-^> 


CATGAGAAAC 


GGGGAACCAT 


AACTAGTTCT 


CTATCTTCAG 


AGCCTTTAAA 


AGGTGCACCA 


9660 




AGGAGGGCAT 


TTTAGGGGAG 


AATATAAAGT 


TGGAGATATA 


GACACAGCCA 


GATTCCTGAG 


9720 


IS; 


AGACCTTATA 


TGCCAGGTAG 


AAGACTTCAG 


ATTGTATGGG 


GGAATTATTA 


GAGAATTTTT 


9780 




AGCAGGGGTG 


TGATATGATA 


AATTTTGTGT 


TGATTAAGTT 


ACTCCAGGAA 


ATATGCGATG 


9840 




GGTGGATTGA 


AGGATGGGGC 


ACCTTTTNTC 


TAGGACGAAA 


AAGAAA^AGT 


AGTTGGTGAA 


9900 




GTCAGTTAGA 


GGTAGTTUMA 


GGATGAAGAA 


GGGATCTGAA 


TGACCCCTTG 


GCCATTCAGT 


9960 




GAGTAGTGAT 


GCTATTCACC 


TAGATACAGC 


ACATAGNNGG 


GAAANGAAAT 


NCTGGGAAGG 


10020 




AGGGAGATGA 


GACCGAGTTA 


GCTTTAAAAT 


AACTAAATTC 


AGGCCTAGGA 


GCCTATAGGC 


10080 




TATCCAGATA 


GAAAlAi i iH 






ACTCAGGAAG 


GAGGCTTCCG 


10140 




TGGGAGCAGA 


ACACTTGGGC 


ACCATTAGGG 


TGTATGTGGT 


ANATGCATTC 


TTGTGCAGCA 


10200 




GTCAAGGGGA 


TGGGATTTAG 


ACTCAAGTGC 


AAATTGCCCC 


CCATCTCCTG 


TGATAAGTGA 


10260 




CTGAAGCTCT 


CCGGGCTTCA 


GTTTCCTAGT 


TCATCATAGT 


GGGCTCTAGC 


GGATAAATGT 


10320 



TACAAAGGTT 


AAATGAGACA 


ACATAGGCAA 


AGTGCGTGGT 


ACTCAATAGA 


AGTCAGCTGC 


10380 


TGTCATCAGC 


AGCAGGATCA 


CCAGAATGTG 


GTGCTTGACA 


CCAAAAGATT 


AGGTGAGATT 


10440 


GCCCAAAACA 


GCAGGTGAAA 


TGAGGGGAGA 


GGATGNAAGT 


CAAACACAGG 


AAGAAAAGCC 


1050O 


TTTGAAGTAT 


GTGGAAAGAA 


ACAACCAGAA 


AGGTAAGATA 


AGAACCAGAA 


GAGATTCAAG 


10560 


AAGGAAGGTG 


TGGCCGGGCG 


CGGTGGCTCA 


AGCCTGTAAT 


CCCAGCACTT 


TGGGAGGCCG 


10620 


AGGCGGGCGG 


AACACGAGGT 


CAGGAGATCG 


AGACCATCCT 


GGCTAACACG 


GTGAAACCCC 


10680 


GTCTGTACTA 


AAGATACAAA 


AGAATTAGCC 


GGGCGCGGTG 


GCAGGCGCCT 


GTAGTCCCAG 


10740 


CTACTCGGGA 


GGCTGACGCG 


GGAGAATGGC 


GCGAACCCGG 


GAGGCGGAGC 


TTGCAGTGAG 


10800 


CCGAGATCGC 


GCCACTGCAC 


NTCNAGCCTG 


GGCGACAGAG 


CGAGGAGCCG 


TCTCAAAAAA 


10860 


AAAAGAAAAA 


AAAAAAAAAG 


TAAGGAAGGT 


GTGGCCAAGA 


TTGAGAAATT 


CGTCAGAGCA 


10920 


AACAAGGCAG 


TCAGGGGCTA 


AATAGCCTCC 


TTTAAATTTT 


ACAACCTTGA 


GGACCTCGGC 


10980 


AACTTTAACA 


GAATTTCAGT 


GGATCCCTAG 


GGCAAACCAG 


GCCTTACAAA 


CCAGGAATGG 


11040 


ATGGTCAATA 


GGAAGTGGAG 


ACAGTAAGTG 


TAGACCTTAC 


CTTGGAGGGA 


AGGNAAGAGA 


11100 


AAGAGCCATG 


GCCAAGGGAA 


GTTTGAAATC 


AAAGGAAATA 


TCTTTTTTTT 


TTTTTTTCGA 


11160 


TTGGAGAGAC 


CTCAGTTATT 


CTTTTAAAAT 


ACTTATTGAG 


CCCCTCAGTT 


ATTCTTTTAA 


11220 


AATACGTATT 


GAGTCCCTAC 


TTTGAGTCAG 


GCACNATGGC 


AGACACGAGG 


GNGATAGCAG 


11280 


TGAATCAGAC 


AGATGCAACG 


CCTGCCTTCA 


TGGAGTTTCA 


CCTTAGCATC 


TGTCCATATG 


11340 


CTAGGGGAGT 


GGGGCAGGGG 


CAGGGAGCTG 


GATACAGGAG 


AGACTGAAGA 


TCCAGGGAGC 


11400 


AAGTGAGTAA 


AGAATAGGGC 


TTGAGATCCC 


ACAGACAACT 


CAGCTTTGAA^' 


^ CAAAAGGGTT 


11460 


TTGTCATCCA 


ATAGGACAAG 


AAGGCGTTAG 


GATACATCAA 


ACGTGGTTGT 


TGAAAACAGA 


11520 


AAAGGGCTGG 


GCACTGTGGC 


TCATGCCTAT 


AATCCCAGCA 


CTTTGGGAGG 


CCAAGGTGGG 


11580 


CAGATCACTT 


GAGGCCAGGA 


GTTCGAGACC 


AGCCTGGCCA 


ACATGGTGAA 


ACCCCATCTC 


11640 


TACTAAAAAT 


ACAAAAATTA 


GCCAGGTGTG 


GTGGTGCATG 


CCTGTAATCC 


CAGCTACTTG 


11700 


GAAGGCTGAG 


GCAGGAGAAT 


TGCTTGAACC 


CAGGGGGTGG 


AGGTTGCAGT 


GAGCCACGAT 


11760 


CGTGCCACTG 


CACTCCAGCC 


CGGGCAACAG 


AGCGAGACTC 


TGTCTCAAAA 


AAAAAAAAAG 


11820 


GAAGAAAGAA 


, CATAGACAGG 


GAAATGTAGT 


TAAGGNNAGT 


TTGGGTTTGG 


GTTTGGTAGA 


11880 



AGCGTTTTCT 


GTTNNTTGTT 


TGTTTGTTTT 


CAGAAAGAGT 


CTCACTCTGT 


TGTCCAGACT 


11940 


GGAGTGCAGT 


GGCACAATCT 


TGGCTTGCTG 


CAGCCTCTGC 


CTCCTGGATT 


CAAGCAATTC 


12000 


TCCTGCCTCA 


GCCTCCTGAG 


TAGCTGGGAT 


TACAGACACC 


TACCACCACA 


CCAGGCTAAT 


12060 


TTTTGTATTT 


TTAGTAGAGA 


CGGGGTTTCA 


CCATGTTGGC 


CAGGCTGGTC 


TCAAACTCCT 


12120 


GACCTCAGGT 


GATCCACCTA 


TCTTGGCCTC 


TCAAAGTGCT 


GGGATTACAG 


GTGTGAGCCA 


12180 


CTGCACCTGG 


CCTAACATTG 


ATATCTGTTG 


ATGAGAAGAA 


GCCAGGTGTT 


GGAGTGATAG 


12240 


CTTATAGCAC 


ATGAACTGAA 


TAAAACAGTG 


TTTAAGACAA 


TGTTTGCAAC 


ATAATAGGCA 


12300 


CTGAAGACAT 


GTTAATGGAA 


GGTGGATTTG 


TGATTCAGAA 


CCTCTAGACT 


ACCTGGGCGA 


12360 


GTCTTTTAAA 


ATGTAAGTAA 


TATCTTAAGT 


GATATTACTT 


GTCCCAGATC 


AGTTGTTTAA 


12420 


AACTGAGGTT 


TAATGCTGTC 


AGAGTAGCAC 


TGTATCGTCT 


TCTATCATGG 


GGGCCTTTGT 


12480 


TGGCTTTAGG 


AGGTTTGTGT 


TTCATAGTAG 


TTTCCCAGTG 


GGCTCTTTGT 


TACCTGTAAT 


12540 


GAGTGTGACA 


GTTATGCCAT 


AACCAGGTTT 


TATATGGAAT 


ACAATTTTGA 


GAAAGTTCTT 


12600 


TCTAGGCAGA 


GAAGCTTATT 


TGAACCTCTT 


ATTATATTTG 


GGTTTCAGGC 


TTTTGAGTTC 


12660 


TTCTGAAATA 


ATAGCCCTTT 


GAAGGTAGCT 


ATTGCTATGA 


CTTCATTAAA 


TTCTAATGCC 


12720 


TCTGGTTTTC 


TCCCCCAGGT 


TTCTGCATAT 


GAAGTGTGTA 


AAATAGATTG 


CTTGATCCAA 


12780 


AACAGAAAAA 


CAGTGATAAC 


TGTTTTGCTG 


AGTTCCCAGA 


CCCTTCCCAA 


GATGGAACCA 


12840 


ATAACATTCA 


CAGCAAGGAA 


ACATCTGCTT 


TCTAACGAGG 


TCTCGGTGGA 


TTTTGGCCTG 


12900 


CAGCTGGTGG 


GCTCCCTGCC 


TGTGCATTCC 


CTGACCACCA 


TGCCCATGCT 


GCCCTGGGTT 


12960 


GTGGCTGAGG 


TGCGAAGACT 


CAGCAGGCAG 


TCCACCAGAA 


AGGAACCTGT 


AACCAAGCAA 


13020 


GTCCGGCTTT 


GCGTTTCACC 


CTCTGGACTG 


AGATGTGAAC 


CTGAGCCAGG 


GAGAAGTCAA 


13080 


CAGTGGGATC 


CCCTGATCTA 


TTCCAGCATC 


TTTGAGTGCA 


AGCCTCAGCG 


TGTTCACAAA 


13140 


CTGATTCACA 


ACAGTCATGA 


CCCAAGTTAC 


TTTGCTTGTC 


TGATTAAGGA 


AGACGCTGTC 


13200 


CACCGGCAGA 


GTATCTGCTA 


TGTGTTCAAA 


GCCGATGATC 


AAACAAAAGT 


AAGTGAGATG 


13260 






GTGGCTGGCT 


GGTTTTTATT 


GTATGGGGGT 


CAGGATATTT 


13320 


ATTTTAAGTA 


TACTGAAATG 


AATAAGGAAT 


TAATGCTGCA 


GTTATAAATT 


GATTACTTAG 


13380 


CTGAATTTTT 


GTTTTATGGT 


GATAGTTTAT 


AGTTTTAAAG 


CACATTTGAA 


AACAGATACG 


13440 


AGAAATTATC 


AGTTTTTGAG 


TTCAAAAATT 


CAAGAGAAAT 


CAGTCTAAAA 


CTACTAATTA 


13500 



AGAGCAGAAG TGTTAAGATG TACATTATTT CAGATGAATG TTCTAAAGCC ATGCCTCTCA 13560 
AACTGAAATG AGCTTGTGAG TCACCTGGGG ATCTTGTTAA AATGTGAATC TTGATTCAGT 13620 
AGGTCTGGGG TGGACCCCAA GACTGCATTT GTAACAAGCT GCCAAGAAAT GCTGATGCTG 13680 
CCCTTTTGCA GGTTGCACTT TGAGTGGCAA AGTTCTAAAT CTCCACATTT GTAATCCTAT 13740 
TAAGAAAAAT ATAGTCATTC GTAAACTGTG TAAAAATGCT ACTGGCCAGT TTCCCAAGGC 13800 
ATAATGTTCA CTTAGGCAAA GGTCATTGAT AAGAACGCTG GATATGCATC TAAGTTTTGA 13860 
TGCGATCAGG GGTTCTTTGT GTTTTTTTCT TTCGCAAACC TCAGGTCAGA TCTGATTAGC 13920 
TTGTTATTAT CACATGATAT GGCTGAAAAA AAATGTGAGA CATGGTAAAA GTTCTGCTCT 13980 
TTCCTCGTTC ATTTGTGCTT GCTTTGTTAT TAGCATTCGT TGTAGCTCTG GGCAGGACTC 14040 
ATTTGAAGAT GCTTGNCCCA TTTTATGAGG ATTAGCTTAG ATAAAATTGA AAATATAATG 14100 
CAAATAGCAA CTTTCTCAGT TGGGCTCAGG GCTCCACAGC TAACCCCATG GACTGTGGAG 14160 
TCTTGCCGTT GTTTTGGGTG CCAAGCAAGC CAAGTCACAT GTGATTCAAG CTGTCTGCCA 14220 
CATGTACAGG GCGAGGATGC GAGTGTCAAT CCACCTGTTA ACTGTCAGTG AAGCCTTGAA 142^0 
AGCTTCTCAT ATTTTCAAGG TTAAAATCTG GATAGAAATG CTAAAGTTTT CTCTCTGCAC 14340 
TCCATTAGGT TATTTTATGT ACTCTCTAGG GTGTAAGGAC CTTATTTAGA AATTAATATT 14 400 
CTTGGTATCA AGTAGATGCC CTTTTGCTTG TTCATTTGTT GGTTCTTCTA GTCATTCAGA 14460 
ATTGCTGTTG CAGGTACTGT TGGAGATGAT ATTAGCAGAG GCTTGTAGGA AGGCAGGAGC 14520 
ATCAGTGGCG AATAGGACCA GGTGATCTAT GTATAGGACA TAATGGAAGG ACTGAGAAGG 14580 
GAGCCTAACA ci'cACCCAAA GGGTAGAGAA GGCTTTGTGA AATAAAGGCT AATATGGAGC 14640 
TCAAAACCAC CATTTCACTC ACAGAATCAA ACTCTCATAT TATAAATCAT TTCATGTTAT 14700 
TGTCCACACA TCTCAAGTGG GCACGGCAGC ATCAGGCTTG GAGATTCAGA GGGACTAACT 14760 
TCCTGTACTC TAATCCTACT TCTGCACCCA TAAACTGGGT GGCCTCAGGC AATTGAGTCT 14820 
GTTTTCTTAT CTGTTAAATG GGGATAATTA CAGTATTTAT CCAATAGAGT TGCTGGAAAG 14880 
ACTAAATGAG GTAGCACTCG ACCTGAAACT TAGTAAGCAT TTATAGCCAT AAAAACATTT 14940 
TCATTCAAGA AAATTTTACT AGAGGCAGAT TATATGCTAA TTTCATTTCA CGTCTTAGGT 15000 
AAAAAGAAAC ATGATACCTA GATGAGTGCC TTCAGCTTTC AAAGATGAGA TTCTGGTCAT 15060 
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C ' ' 

ATTTGAGGAA CATTTTAAAA ACTACACGTA TAACTTAATG GCTCCTATTA TTTGGACAAA 15120 
TTCCAGAATG AAAATGAGAG GACTGAACAG CCTGTACCTC AGTCCAGCTC TATATAGTAT 15180 
TTGGACTGAA TTTCCTTGGG GAGAGTTTGT GCGTGGAATC GTTGTTCAGC ATTTTACACA 15240 
TTTGACTCTT TCCCAAAATC TTTTACGGCC ATCTGAGAAT AGGCTTCTGG CCAGTCATTC 15300 
GGATGCCTGA CAAGAGAAAG AGATTTATAA CCAAATTCTG TAATTGGGAC TTCCAGTCTT 15360 
TCCCCAAGTA GAGAATTGGA CTTACTCTAT ATGCTAAAAA CCCATGGTTG AAATATGAAT 15420 
TAGTTCTTAA GTGATTTTTG GCTTGCATAC CATTTTTGCA AACACAAATT GTCATTACTC 15480 
TGCTCATTTA ATAAAAGAAT AATTTGTAGT ATAGGTATAT ACCTCAATCA GTGATTTTGT 15540 
TGTTGGAAAC AGAACAGTAA ATCACACTGG CCATGATGCT AACAGCGTGA TAGATTTTCT 15600 
GTTCTTGGGA CACCAATGTC ACTGTATCTC ATAGCGAAGG ATTATCTGCT GTAGGAGCAT 15660 
TCTCTTGACT ACTTATAACA TTTGCTGGGT GAAATAATTC TCCAGGTTAA GGCCTCTTCT 15720 
AAACAGATGA GGTCAGCACT AACTGCATTT GCCAGAGAAG ACATATGCAT TTACTGCCAG 15780 
CATCATAAJ^C ACAAAACTAC AGTTTGCGAG GAAACCCTTT GACCAGCATC TAATTAATTC 15840 
ACTGAGTAAT GTCTTGGGAG AAGAGGCATG TAAAGGAACA ATTTTATAAG CATGCCATGA 15900 
GATTGTTTTC NNATTGTATG TTCCATAGAA TATGAGGAAA CTTCAAAACA TTTTGTGGAA 15960 
AAATTGAATT AAAAAGNNAA AAACACATAT ATACATAAGC TTTATTTCTC AAGATAAACT 16020 
TTATCAAGTT CAAGACACTT TTGTAAGCAA TGTTAACAGC CATTGAGTCG GTCTCTAAAG 16080 
AACTGAGGGT CCTGGGAATT TAACCATGTT TATACAGTCT TTTATACATT ATTAACTGGA 16140 
GAAAAATTGG CGCTCTTTAA AGATTTTTTA AAATTGAGAA GCAAAAGGAC GTCAGAAGGA 16200 
GCCAAATTAG GCCTGNNAAG TGGATGCCTA ATGATTTCCC ATGGAAACTC TTGCAAAATT 16260 
GCTCCTGTTT GATGAGAGGA ATGAGCAGGA ACATTGTCAT GGTGGACAAG GACTCTGGTG 16320 
AAGCTTTNNC AGGCGATTTT CTGCT.^GC TTTGGCTAAC TTTCTCAAAA CACTCTCATG 16380 
ATAAACAGAT GTTATCATTC TTTGGCCCTC CAGAAAGTCA ACAAACAAAA TGCCTTGGGC 16440 
ATCCCAAAAA ACTATTGCAA CCATTTGCCC TTGACCAGTC CACTTTCGCT TTGACTGGAC 16500 
CACTTCTGCT CTCAGTAGCC ATTGCTTAAA TTTGTCTTGA TCTTTAGGAT TGCGCTGGTA 16560 
AAACTATGTT TCATCACCTG TTACAATTCT TTGAAGAAAT GCTTCAGGAT CTTGATCCCA 16620 
CCCGTTTAAA ATTTCCATTA GAAACTCTGC TCTTGTCTGC AGCTGATCTG AGGGCAATGG 16680 
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TTTTGGCACC 


CATCTAGTAA 


AACGTTTGCT 


CAGTGTTAAT 


TTTTCATCCA 


GGATTGTGTA 


16740 


AGCTGAACCA 


GCAGAGATGT 


CTATGATATT 


GGCTAGTTGG 


TCCTCTTCAA 


TGAGGGCATG 


16800 


AACAAGATGA ATATTTTCCT 


CAAACAATTA 


TCTGGATGGT 


CTGCTGCTGC 


AGGCTTCATC 


16860 


TTCAATATTG 


TCTCGTCCCT 


TCTTTTTCTT 


TTCCCCCCCG 


CTTGAGACAC 


AGTCTTGTTC 


16920 


TGTTGCCCAG 


GTTGGAGTGC 


AGTGGCCCGA 


TTTCGGCTCA 


CTGCAACCTC 


TGCCTCCCGG 


16980 


GCTCAAGCGA 


TTCTCCTGCC 


TCAGCCCACC 


AAGTAGCTGG 


GATTACAGGT 


ACACATGATC 


17040 


GTGCCTGGCT 


AATTTTTGTA 


TTTTTAGTAG 


AGACAGGGTT 


TCACCGTGTT 


GGCCAGGCTG 


17100 


GTCTCGAACT 


CCTGACCTCA 


AGTAATCCAC 


CTGCCTTGGC 


CTCCCAAAGT 


GCTGGGATTA 


17160 


TAAACATGAG 


CCACCACACC 


TGGCCTCATC 


CTTTCTTAAA 


ATGAGTTATA 


CATTTGTAAG 


17220 


CTGCTGATTT 


CTTTGGACAT 


TGTGCCTATA 


AACTTTTTGT 


AAAGCATCAG 


TGATTTCACC 


17280 


ATTCTTCCAC 


CCAAACTTCA 


CCATAAGTTT 


GATGTTTCTT 


CTTGCTTTGA 


TTTTAGCAGG 


17340 


ATTCATGTTT 


CTCTGATAGG 


GGGTCTTTTC 


AAACTGATGT 


CTTATCCTTC 


TTAGAGCCTC 


17400 


ATCCCAGATC 


CTGTTCAGAC 


ATGCTACAAG 


TTAATACAAG 


TTTATTTGGT 


GCCAAAAAAT 


17 4-60 


GGAAATCCAT 


GCATAGTTTT 


TAAATAATAT 


GCATTTTTCA 


TGNACTTTTT 


GAAGACCCCT 


17520 


TGTATACTTA 


AACTGCTCCA 


CATGGAAAAG 


CTTCCATGAT 


CAAATGCAGT 


AAGGCAGCAT 


17580 


CTCAAACAT 












17589 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1999 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: DOUBLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GGACAGTATC TAGCACAATA CCCCAAATCG ACTAACTCCT CCGTAAAGAA TAGCTACCAC 
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TATTGTGAGA GTTTTAAGTC AAGCTGTGAA TAAAACTCTT GGGTCCACTT AAAAATACCT 120 

CCCCTGGATG TAAGCATCCA GGGAAATCAG GGAATGCCAT AAGACAGCCC TAATCTAAAA 180 

GCCTACAAGA AGCTCAGTGG GCTTCAAGGA AGACACTGCT CTTGGTACGA TGAGGAAACC 240 

TGGCCCTCTA TTTGCCTCCT GGGCCACAGT AATATTGATA ATAGCTGCTG CTTTTAGTTG 300 

AGGACCATGT ACGTCTGTGT CACTGCACTG GCCACTTTAC TTACACTTTC CTGCTTTGTC 360 

CTCACAAAGA TCCTGTAAGG TGTGTATTGG TCCCATTTAG CAGGTAAGAC AATGAAGACC 420 

AGAGGTCCAG CACCTTGCCT AAACCACACC TGCTGGGATT TGGATTCAAG TCCAACCGTA 4 80 

CAGCTCAAAC GCTCAGCCAC TTCCCTAAAG TCCACCCCCA GCTACATTAA GTAAAAAAAT 540 

CCAGAAAGAT GCCACCTGGG GGTCTGGAAC TGCCTCCTCC GAGCACCCGG CTCTCCCCTC 600 

CCTGCGGACT CTTCTCTGGA GAGGATGTGA TGCTTCTTAC TTTTCTCAGA TCCCTCTCCC 660 

CACCCTGCGA GTGACGTTGC GCCTCTGTGC CTGGTGGGAT AGGGATCTGG GAGCTTCGCC 720 

TGTTTTTTGC ACACTGCCAT CCCCTAGTCT TAGGGAGCGA GCTCTGTCCC GCTTTTCACA 780 

TCTCCGCGTC TTTCCTTGCA CTCTACATCA CCGCTGGGAA TGTCCCCAGA CCTGATCGGG 840 

GCATGCACAC TGGGGTGTGC GTGTGCGTGT GGTGTGTGTT CCTGCGCGTG TGCCGGGCTC 900 

GCGGGGCAGG AAAAAGCGCC TAATCCAGGC TCTGCGTCAC TCCCGCAATT GGTTAGAAAT 960 

GGAGTTTCCT GGTGTTTAAT CCCGGGAGGG CACTTCGCCT TCGTTGTTTC CCAGAGTCCC 1020 

TGATTTTCCT GCCTCGCATG CCAGCGCCCC ATAGGGCATC CGTGCCTCAG TTCACCTCTT 1080 

GCCATCCTCC AAGGACGGGG AGAAGGGGTA AGGCGGGGGA GAGCAAGGTG GCTTGGTCGC 1140 

CCCCGGCCCC CGCCCCCCAT GTTGTGTGCA GTTTCCACCA CGTCTGTTTC GGAGGGAGAA 1200 

GAGGAGGGTG CAGATGAGGC GAGGCGCCTT CGGGAGCGCG GAGAGCGGGC 'IaGGCAGTCCC 1260 

ACCTGCTGAG AGCCACTCAG GCCGAGCAAG CGGCGGGCAG TGCCACCTGC TATAAATAGG 1320 

CCGCCAAGGA CAGGGTGTGC GACTGTACAT CCCGCCACGA GGGCCTGCAT CACGCGCGGG 1380 

GCCCCGCGCC CCCGGCTCCC CAGGGAAACG CTGTGCCCAG ATCCTGCGCA GGGGTCTGGA 14 40 

TGGGGCGGCG GCCCGAGTAC TTCCCCCCTA TTCCCCCCAC ANACACTGGC TGAGGATGGC 1500 

CCGCGGGCTT GGGGGCGGGG GGTGGCAAGG AGGGGAGGGA GGCCGCGGCG GACCCGCAGT 1560 

GCAGCAGCTG TTGCTCGCGT GTGACTCGCC CGTCCGGGCC GTGCTGCCCA GGCACAGTCA 1620 

CACGGCGCAG TGGGGAGGAG GAGGACACCG AGTCCCCCTC CCAGCTCCCC GGGGACCGAG 1680 
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TGGGGAGATC CCGGCTCCTG TCTTCCCCTC GCCTCCAGCG CGCTCGCCCA GGCTGGGAGG 17 40 

AGGAAACCAG AGCCGCGCGC AGACACCTCC TCCTTCTCCT CCTCTTCTTC CTCCTCCTCC 1300 

TCCTCCTCCT CCTCCTCTTC GGCTGCTGCT CCTGGTGCCG CCACCGTCCG CCGGTGCCTG 1860 

TTGCTGCCGC CGCCGCGGGA CCTGCTGTGT CCTCAGCTGG GTGGAGAAGA GGCGGGCGCC 1920 

GAGCCGAGGG GAGCCCCCTC CCCGTCCCCC CGCGGCGGGA AGAGCGCAGC CAGCCGGGTG 1980 

CGATGGACTC CCCGCCCGC 1^^^ 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3984 base pairs 
xT), (B) TYPE: NUCLEIC ACID 

Q (C) STRANDEDNESS : SINGLE 

£ (D) TOPOLOGY: LINEAR 

\J\ (ii) MOLECULE TYPE: DNA 



"4 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCAGGCCGTC CCCAGGATGC CCCCAAGCAC CTGCGNGTCC CGGCCCGGCC CCGGGCTCTG 60 

AGCGCGCCGC GGCACAGGTT TCTGCATATG AAGTGTGTAA AATAGATTGC TTGATCCAAA 120 

ACAGAAAAAC AGTGATAACT GTTTTGCTGA GTTCCCAGAC CCTTCCCAAG ATG GAA 17 6 

Met Glu 

1 1 

CCA ATA ACA TTC ACA GCA AGG AAA CAT CTG CTT CCT AAC GAG GTC TCG 224 
Pro lie Thr Phe Thr Ala Arg Lys His Leu Leu Pro Asn Glu Val Ser 
5 10 15 

GTG GAT TTT GGC CTG CAG CTG GTG GGC TCC CTG CCT GTG CAT TCC CTG 272 
Val Asp Phe Gly Leu Gin Leu Val Gly Ser Leu Pro Val His Ser Leu 
20 25 30 



ACC ACC ATG CCC ATG CTG CCC TGG GTT GTG GCT GAG GTG CGA AGA CTC 
Thr Thr Met Pro Met Leu Pro Trp Val Val Ala Glu Val Arg Arg Leu 
35 40 45 50 



320 



15 



368 



416 



512 



560 



608 



AGC AGG CAG TCC ACC AGA AAG GAA CCT GTA ACC AAG CAA GTC CGG CTT 
Ser Arg Gin Ser Thr Arg Lys Glu Pro Val Thr Lys Gin Val Arg Leu 
55 60 65 

TGC GTT TCA CCC TCT GGA CTG AGA TGT GAA CCT GAG CCA GGG AGA AGT 
Cvs val Ser Pro Ser Gly Leu Arg Cys Glu Pro Glu Pro Giy Arg Ser 
70 80 

CAA CAG TGG GAT CCC CTG ATC TAT TCC AGC ATC TTT GAG TGC AAG CCT 4 64 

Gin Gin Trp Asp Pro Leu lie Tyr Ser Ser He Phe Glu Cys Lys Pro 
85 90 5^ 

CAG CGT GTT CAC AAA CTG ATT CAC AAC AGT CAT GAC CCA AGT TAC TTT 
Gin Arg Val His Lys Leu lie His Asn Ser His Asp Pro Ser Tyr Phe 
100 105 110 

GCT TGT CTG ATT AAG GAA GAC GCT GTC CAC CGG CAG AGT ATC TGC TAT 
Ala Cys Leu He Lys Glu Asp Ala Val His Arg Gin Ser He Cys Tyr 
115 120 125 130 

GTG TTC AAA GCC GAT GAT CAA ACA AAA GTG CCT GAG ATC ATC AGC TCC 
val Phe Lys Ala Asp Asp Gin Thr Lys Val Pro Glu He lie Ser Ser 
135 140 145 

S ATC CGT CAG GCG GGG AAG ATC GCC CGG CAG GAG GAG CTG CAC TGC CCG 656 
m lie Arg Gin Ala Gly Lys He Ala Arg Gin Glu Glu Leu His Cys Pro 
150 155 160 

TCC GAG TTC GAC GAC ACG TTT TCC AAG AAG TTC GAG GTG CTC TTC TGC 704 
^ Ser Glu Phe Asp Asp Thr Phe Ser Lys Lys Phe Glu Val Leu Phe Cys 
S: 165 170 175 

S GGC CGC GTG ACG CTG GCG CAC AAG AAG GCT CCG CCG GCC CTG ATC GAC 7 52 

i Gly Arg Val Thr Val Ala His Lys Lys Ala Pro Pro Ala Leu He Asp 
180 185 190 

GAG TGC ATC GAG AAG TTC AAT CAC GTC AGC GGC AGC CGG GGG TCC GAG 8 00 

Glu Cys He Glu Lys Phe Asn His Val Ser Gly Ser Arg Gly Ser Glu 
195 f 200 205 210 

AGC CCC CGC CCC AAC CCG CCC CAT GCC GCG CCC ACA GGG AGC CAG GAG 848 
Ser Pro Arg Pro Asn Pro Pro His Ala Ala Pro Thr Gly Ser Gin Glu 
215 220 225 



CCT GTG CGC AGG CCC ATG CGC AAG TCC TTC TCC CAG CCC GGC CTG CGC 896 
Pro Val Arg Arg Pro Met Arg Lys Ser Phe Ser Gin Pro Gly Leu Arg 
230 235 240 

TCG CTG GCC TTT AGG AAG GAG CTG CAG GAT GGG GGC CTC CGA AGC AGC 944 
Ser Leu Ala Phe Arg Lys Glu Leu Gin Asp Gly Gly Leu Arg Ser Ser 
245 250 255 
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GGC TTC TTC AGC TCC TTC GAG GAG AGC GAC ATT GAG AAC CAC CTC ATT 992 
Gly Phe Phe Ser Ser Phe Glu Glu Ser Asp lie Glu Asn His Leu lie 
260 265 270 

AGC GGA CAC AAT ATT GTG CAG CCC ACA GAT ATC GAG GAA AAT CGA ACT 1040 
Ser Gly His Asn He Vai Gin Pro Thr Asp He Glu Glu Asn Arg Thr 
275 280 285 290 

ATG CTC TTC ACG ATT GGC CAG TCT GAA GTT TAC CTC ATC AGT CCT GAC 1088 
Met Leu Phe Thr He Gly Gin Ser Glu Val Tyr Leu He Ser Pro Asp 

295 300 305 

ACC AAA AAA ATA GCA TTG GAG AAA AAT TTT AAG GAG ATA TCC TTT TGC 1136 
Thr Lys Lys He Ala Leu Glu Lys Asn Phe Lys Glu He Ser Phe Cys 
310 315 320 

TCT CAG GGC ATC AGA CAC GTG GAC CAC TTT GGG TTT ATC TGT CGG GAG 118 4 

Ser Gin Gly He Arg His Val Asp His Phe Gly Phe He Cys Arg Glu 
325 330 335 

a TCT TCC GGA GGT GGC GGC TTT CAT TTT GTC TGT TAC GTG TTT CAG TGC 1232 
□ Ser Ser Gly Gly Gly Gly Phe His Phe Val Cys Tyr Val Phe Gin Cys 
340 345 350 

^ ACA AAT GAG GCT CTG GTT GAT GAA ATT ATG ATG ACC CTG AAA CAG GCC 128 0 

Ifi Thr Asn Glu Ala Leu Val Asp Glu He Met Met Thr Leu Lys Gin Ala 

355 360 365 370 

ti 

D TTC ACG GTG GCC GCA GTG CAG CAG ACA GCT AAG GCG CCA GCC CAG CTG 1328 

'3= Phe Thr Vai Ala Ala Val Gin Gin Thr Ala Lys Ala Pro Ala Gin Leu 

375 380 385 

TGT GAG GGC TGC CCC CTG CAA AGC CTG CAC AAG CTC TGT GAG AGG ATA 137 6 

Vn Cys Glu Gly Cys Pro Leu Gin Ser Leu His Lys Leu Cys Glu Arg lie 
390 395 400 

gag; GGA ATG AAT TCT TCC AAA ACA AAA CTA GAA CTG CAA HAG CAC CTG 1424 
Glu Gly Met Asn Sei Ser Lys Thr Lys Leu Glu Leu Gin Lys His Leu 
405 410 415 

ACG ACA TTA ACC AAT CAG GAG CAG GCG ACT ATT TTT GAA GAG GTT CAG 1472 
Thr Thr Leu Thr Asn Gin Glu Gin Ala Thr He Phe Glu Glu Val Gin 
420 425 430 

AAA TTG AGA CCG AGA AAT GAG CAG CGA GAG AAT GAA TTG ATT ATT TCT 1520 
Lys Leu Arg Pro Arg Asn Glu Gin Arg Glu Asn Glu Leu He He Ser 
435 440 445 450 

TTT CTG AGA TGT TTA TAT GAA GAG AAA CAG AAA GAA CAC ATC CAT ATT 15 68 

Phe Leu Arg Cys Leu Tyr Glu Glu Lys Gin Lys Glu His He His He 
455 460 465 

GGG GAG ATG AAG CAG ACA TCG CAG ATG GCA GCA GAG AAT ATT GGA AGT 1616 
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Gly Glu Met Lys Gin Thr Ser Gin Met Ala Ala Glu Asn He Gly Ser 

470 475 480 

GAA TTA CCA CCC AGT GCC ACT CGA TTT AGG CTA GAT ATG CTG AAA AAC 1664 

Glu Leu Pro Pro Ser Ala Thr Arg Phe Arg Leu Asp Met Leu Lys Asn 
485 490 495 

AAA GCA AAG AGA TCT TTA ACA GAG TCT TTA GAA AGT ATT TTG TCC CGG 1712 

Lys Ala Lys Arg Ser Leu Thr Glu Ser Leu Glu Ser He Leu Ser Arg 

500 505 510 

GGT AAT AAA GCC AGA GGC CTG CAG GAA CAC TCC ATC AGT GTG GAT CTG . 17 60 

Gly Asn Lys Ala Arg Gly Leu Gin Glu His Ser He Ser Val Asp Leu 

515 520 525 530 

GAT AGC TCC CTG TCT AGT ACA TTA AGT AAC ACC AGC AAA GAG CCA TCT 1808 

Asp Ser Ser Leu Ser Ser Thr Leu Ser Asn Thr Ser Lys Glu Pro Ser 
535 540 545 

GTG TGT GAA AAG GAG GCC TTG CCC ATC TCT GAG AGC TCC TTT AAG CTC 18 56 

Val Cys Glu Lys Glu Ala Leu Pro lie Ser Glu Ser Ser Phe Lys Leu 

550 555 560 

CTC GGC TCC TCG GAG GAC CTG TCC AGT GAC TCG GAG AGT CAT CTC CCA 1904 

Leu Gly Ser Ser Glu Asp Leu Scr Ser Asp Ser Glu Ser His Leu Pro 
565 570 575 

GAA GAG CCA GCT CCG CTG TCG CCC CAG CAG GCC TTC AGG AGG CGA GCA 1952 

Glu Glu Pro Ala Pro Leu Ser Pro Gin Gin Ala Phe Arg Arg Arg Ala 

580 585 590 

AAC ACC CTG AGT CAC TTC CCC ATC GAA TGC CAG GAA CCT CCA CAA CCT 2000 

Asn Thr Leu Ser His Phe Pro He Glu Cys Gin Glu Pro Pro Gin Pro 

595 600 605 610 

GCC CGG GGG TCC CCG GGG GTT TCG CAA AGG AAA CTT ATG AGG TAT CAC 2048 

Ala Arg Gly Ser Pro Gly Val Ser Gin Arg Lys Leu Met Arg Tyr His 
615 620 625 

TCA GTG AGC ACA GAG ACG CCT CAT GAA CGA AAG GAC TTT GAA TCC AAA 2096 

Ser Val Ser Thr Glu Thr Pro His Glu Arg Lys Asp Phe Glu Ser Lys 

630 635 640 

GCA AAC CAT CTT GGT GAT TCT GGT GGG ACT CCT GTG AAG ACC CGG AGG 214 4 

Ala Asn His Leu Gly Asp Ser Gly Gly Thr Pro Val Lys Thr Arg Arg 
645 650 655 

CAT TCC TGG AGG CAG CAG ATA TTC CTC CGA GTA GCC ACC CCG CAG AAG 2192 

His Ser Trp Arg Gin Gin He Phe Leu Arg Val Ala Thr Pro Gin Lys 

660 665 670 
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■C: 7 55 "7 60 

ACT ACA GTG TGG GAA AAG ATG CTT AGC ACT CCA GGA AGA TCA AAA ATT 

Thr Thr Val Trp Glu Lys Met Leu Ser Thr Pro Gly Arg Ser Lys He 
775 780 '^^ 



0 AAG TTT GAC ATG GAA AAA ATG CAC TCG GCT GTT GGG CAA GGT GTG CCA 

1 Lys Phe ASP Met Giu Lys Met His Ser Ala Val Gly Gin Gly Val Pro 
Q 790 795 800 

S: CGT CAT CAC CGA GGT GAA ATC TGG AAA TTT CTA GCT GAG CAA TTC CAC 
m- Arg His His Arg Gly Glu He Trp Lys Phe Leu Ala Glu Gin Phe His 
805 810 815 

CTT AAA CAC CAG TTT CCC AGC AAA CAG CAG CCA AAG GAT GTG CCA TAC 
Leu Lys His Gin Phe Pro Ser Lys Gin Gin Pro Lys Asp Val Pro Tyr 
820 825 830 

AAA GAA CTC TTA AAG CAG CTG ACT TCC CAG CAG CAT GCG ATT CTT ATT 

Tn Leu Leu Lys Gin Leu Thr Ser Gin Gin His Ala He Leu lie 
835 840 845 

GAC CTT GGG CGA ACC TTT CCT ACA CAC CCA TAC TTC TCT GCC CAG CTT 

ASP Zl Gly Arg Thr Phe Pro Thr His Pro Tyr Phe Ser Ala Gin Leu 
855 860 865 

GGA GGA GGA CAG CTA TCG CTT TAC AAC ATT TTG AAG GCC TAC TCA CTT 
Gly Ala Gly Gin Leu Ser Leu Tyr Asn He Leu Lys Ala Tyr Ser Leu 
870 875 880 



GCG TGC GAT TCT TCC AGC AGA TAT GAA GAT TAT TCA GAG CTG GGA GAG 2240 
Ala Cys Asp Ser Ser Ser Arg Tyr Glu Asp Tyr Ser Glu Leu Gly Glu 
675 680 685 690 

CTT CCC CCA CGA TCT CCT TTA GAA CCA GTT TGT GAA GAT GGG CCC TTT 2288 
Leu Pro Pro Arg Ser Pro Leu Glu Pro Val Cys Glu Asp Gly Pro Phe 
695 700 705 

GGC CCC CCA CCA GAG GAA AAG AAA AGG ACA TCT CGT GAG CTC CGA GAG 2336 
Glv Pro Pro Pro Glu Glu Lys Lys Arg Thr Ser Arg Glu Leu Arg Glu 
710 715 720 

CTG TGG CAA AAG GCT ATT CTT CAA CAG ATA CTG CTG CTT AGA ATG GAG 2384 
Leu Trp Gin Lys Ala He Leu Gin Gin He Leu Leu Leu Arg Met Glu 
725 730 735 

AAG GAA AAT CAG AAG CTC CAA GCC TCT GAA AAT GAT TTG CTG AAC AAG 24 32 

Lvs Glu Asn Gin Lys Leu Gin Ala Ser Glu Asn Asp Leu Leu Asn Lys 
740 745 750 

CGC CTG AAG CTC GAT TAT GAA GAA ATT ACT CCC TGT CTT AAA GAA GTA 2480 
Arg Leu Lys Leu Asp Tyr Glu Glu He Thr Pro Cys Leu Lys Glu Val 

7 65 ' ' ^ 



2528 



25-76 



2624 



2672 



2720 



2768 



2816 
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3008 



3056 



3104 



CTA GAC CAG GAA GTG GGA TAT TGC CAA GGT GTC AGC TTT GTA GCA GGC 2864 
Leu Asp Gin Glu Val Gly Tyr Cys Gin Gly Leu Ser Phe Val Ala Gly 
885 890 895 

ATT TTG CTT CTT CAT ATG AGT GAG GAA GAG GCG TTT AAA ATG CTC AAG 2912 
He Leu Leu Leu His Met Ser Glu Glu Glu Ala Phe Lys Met Leu Lys 
900 905 910 

TTT CTG ATG TTT GAC ATG GGG CTG CGG AAA CAG TAT CGG CCA GAC ATG 2960 
Phe Leu Met Phe Asp Met Gly Leu Arg Lys Gin Tyr Arg Pro Asp Met 
915 920 925 930 

ATT ATT TTA CAG ATC CAG ATG TAG CAG CTC TCG AGG TTG CTT CAT GAT 
He He Leu Gin He Gin Met Tyr Gin Leu Ser Arg Leu Leu His Asp 
935 940 945 

TAC CAC AGA GAC CTC TAC AAT CAC CTG GAG GAG CAC GAG ATC GGC CCC 
Tyr His Arg Asp Leu Tyr Asn His Leu Glu Glu His Glu He Gly Pro 
950 955 960 

□ AGC CTC TAC GCT GCC CCC TGG TTC CTC ACC ATG TTT GCC TCA CAG TTC 
O ser Leu Tyr Ala Ala Pro Trp Phe Leu Thr Met Phe Ala Ser Gin Phe 
965 970 975 

B CCG CTG GGA TTC GTA GCC AGA GTC TTT GAT ATG ATT TTT CTT CAG GGA 3152 
1" Pro Leu Gly Phe Val Ala Arg Val Phe Asp Met He Phe Leu Gin Gly 
980 985 990 

h RCA GAG GTC ATA TTT AAA GTG GCT TTA AGT CTG TTG GGA AGC CAT AAG 3200 
iXs Thr Glu Val He Phe Lys Val Ala Leu Ser Leu Leu Gly Ser His Lys 
S- 1000 1005 1010 

£ CCC TTG ATT CTG CAG CAT GAA AAC CTA GAA ACC ATA GTT GAC TTT ATA 32 48 

m, Pro Leu lie Leu Gin His Glu Asn Leu Glu Thr He Val Asp Phe He 

1015 1020 1025- 

AAA AGC ACG CTA CCC AAC CTT GGC TTG GTA CAG ATG GAA AAG ACC ATC 32 96 

Lvs ser Thr Leu Pro Asn Leu Gly Leu Val Gin Met Glu Lys Thr|Ile 
1030 1035 1040 

AAT CAG GTA TTT GAA ATG GAC ATC GCT AAA CAG TTA CAA GCT TAT GAA 3344 
Asn Gin Val Phe Glu Met Asp He Ala Lys Gin Leu Gin Ala Tyr Glu 
1045 1050 1055 

GTT GAG TAC CAC GTC CTT CAA GAA GAA CTT ATC GAT TCC TCT CCT CTC 3392 
Val Glu Tyr His Val Leu Gin Glu Glu Leu He Asp Ser Ser Pro Leu 
1060 1065 1070 

AGT GAC AAC CAA AGA ATG GAT AAA TTA GAG AAA ACC AAC AGC AGC TTA 34 40 

Ser Asp Asn Gin Arg Met Asp Lys Leu Glu Lys Thr Asn Ser Ser Leu 
1075 1080 1085 1090 
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CGC AAA CAG AAC CTT GAG CTC CTT GAA GAG TTG GAG GTG GCA AAT GGT 3488 
Arg Lys Gin Asn Leu Asp Leu Leu Glu Gin Leu Gin Val Ala Asn Gly 
1095 1100 1105 

AGG ATC CAA AGC CTT GAG GCC ACC ATT GAG AAG CTC CTG AGC AGT GAG 3536 
Arq He Gin Ser Leu Glu Ala Thr He Glu Lys Leu Leu Ser Ser Glu 
1110 1115 1120 

AGC AAG CTG AAG CAG GCC ATG CTT ACC TTA GAA CTG GAG CGG TCG GCC 3584 
Ser Lys Leu Lys Gin Ala Met Leu Thr Leu Glu Leu Glu Arg Ser Ala 
1125 1130 1135 

CTG CTG CAG ACG GTG GAG GAG CTG CGG CGG CGG AGC GCA GAG CCC AGC 3632 
Leu Leu Gin Thr Val Glu Glu Leu Arg Arg Arg Ser Ala Glu Pro Ser 
1140 1145 1150 

GAC CGG GAG CCT GAG TGC ACG CAG NCC CGA GCC CAC GGG CGA CTG ACA -.3580 
Asp Arq Glu Pro Glu Cys Thr Gin Xaa Arg Ala His Gly Arg Leu Thr 
1155 1160 1165 1170 

GCT CTG CAG GAG AGA TTG CAA CAC CAT CCC ACA CTG TCC AGG CCT TAA 3728 
Ala Leu Gin Glu Arg Leu Gin His His Pro Thr Leu Ser Arg Pro * 
1175 1180 1185 

^■''J m^^7^7^r•r•Ar•A r-TKnnm^nr.rr. anTr,Tr;r TTCTCAGGGA 3783 



m CTGAGAGGGA CAGAAGACGC TGGAAGGAGA GAAGGAAGCG GGAAGTGTGC TTCTCAGGGA 

ill: GGAAACCGGC TTGCCAGCAA GTAGATTCTT ACGAACTCCA ACTTGCAATT CAGGGGGCAT 

b GTCCCAGTGT TTTTTTTGTT GTTTTTAGAT ACTAAATCGT CCCTTCTCCA GTCCTGATTA 

□ CTGTACACAG TAGCTTTAGA TGGCGTGGAC GTGAATAAAT GCAACTTATG TTTTAAAAAA 

:C- AAAAAAAAAA AAAAAA 



3848 
3908 
3968 
3984 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3989 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: homology with EST in :SEQID6 

(B) LOCATION: 1..458 
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1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ATAATAGGCA CTGAAGACAT GTTAATGGAA GGTGGATTTG TGATTCAGAA CCTCTAGACT 60 

ACCTGGGCGA GTCTTTTAAA ATGTTTCTGC ATATGAAGTG TGTAAAATAG ATTGCTTGAT 120 

CCAAAACAGA AAAACAGTGA TAACTGTTTT GCTGAGTTCC CAGACCCTTC CCAAG ATG 17 8 

Met 
1 

GAA CCA ATA ACA TTC ACA GCA AGG AAA CAT CTG CTT CCT AAC GAG GTC 22 6 

Glu Pro He Thr Phe Thr Ala Arg Lys His Leu Leu Pro Asn Glu Val 
5 10 15 

TCG GTG GAT TTT GGC CTG CAG CTG GTG GGC TCC CTG CCT GTG CAT TCC 27 4 

Ser Val Asp Phe Gly Leu Gin Leu Val Gly Ser Leu Pro Val His Ser 
20 25 30 

CTG ACC ACC ATG CCC ATG CTG CCC TGG GTT GTG GCT GAG GTG CGA AGA 322 
Leu Thr Thr Met Pro Met Leu Pro Trp Val Val Ala Glu Val Arg Arg 
35 40 45 

CTC AGC AGG CAG TCC ACC AGA .AAG GAA CCT GTA ACC AAG CAA GTC CGG 37 0 

Leu Ser Arg Gin Ser Thr Arg Lys Glu Pro Val Thr Lys Gin Val Arg 
50 55 60 65 

CTT TGC GTT TCA CCC TCT GGA CTG AGA TGT GAA CCT GAG CCA GGG AGA 418 
Leu CVS Val Ser Pro Ser Gly Leu Arg Cys Glu Pro Glu Pro Gly Arg 
70 75 80 

AGT CAA CAG TGG GAT CCC CTG ATC TAT TCC AGC ATC TTT GAG TGC AAG 4 66 

Ser Gin Gin Trp Asp Pro Leu He Tyr Ser Ser He Phe Glu Cys Lys 
85 90 55 

CCT CAG CGT GTT CAC AAA CTG ATT CAC AAC AGT CAT GAC CCA AGT TAG 514 
Pro Gin Arg Val His Lys Leu lie His Asn Ser His Asp Pro Ser Tyr 
100 I 105 110 

TTT GCT TGT CTG ATT AAG GAA GAC GCT GTC CAC CGG CAG AGT ATC TGC 562 
Phe Ala Cys Leu He Lys Glu Asp Ala Val His Arg Gin Ser He Cys 
115 120 125 

TAT GTG TTC AAA GCC GAT GAT CAA ACA AAA GTG CCT GAG ATC ATC AGC 610 
Tvr Val Phe Lys Ala Asp Asp Gin Thr Lys Val Pro Glu He He Ser 
130 135 140 145 

TCC ATC CGT CAG GCG GGG AAG ATC GCC CGG CAG GAG GAG CTG CAC TGC 658 
Ser He Arg Gin Ala Gly Lys He Ala Arg Gin Glu Glu Leu His Cys 
150 155 160 
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CCG TCC GAG TTC GAC GAC ACG TTT TCC AAG AAG TTC GAG GTG CTC TTC 
pro Ser Glu Phe Asp Asp Thr Phe Ser Lys Lys Phe Glu Val Leu Phe 
165 I'^O 

TGC GGC CGC GTG ACG GTG GOG CAC AAG AAG GCT CCG CCG GCC CTG ATC 
Cys Gly Arg Val Thr Val Ala His Lys Lys Ala Pro Pro Ala Leu He 
180 

GAC GAG TGC ATC GAG AAG TTC AAT CAC GTC AGC GGC AGC CGG GGG TCC 
ASP Glu cys He Glu Lys Phe Asn His Val Ser Gly Ser Arg Gly Ser 
195 200 205 

TAG AGC CCC CGC CCC AAC CCG CCC CAT GCC GCG CCC ACA GGG AGC CAG 
Glu ser Pro Arg Pro Asn Pro Pro His Ala Ala Pro Thr Gly Ser G n 
210 215 220 225 

GAG CCT GTG CGC AGG CCC ATG CGC AAG TCC TTC TCC CAG CCC GGC CTG 
G^u Pro val Arg Arg Pro Met Arg Lys Ser Phe Ser Gin Pro Gly Leu 
230 235 ■^'•^ 

rt CGC TCG CTG GCC TTT AGG AAG GAG CTG CAG GAT GGG GGC CTC CGA AGC 
■5 Arg sir Leu Ala Phe Arg Lys Glu Leu Gin Asp Gly Gly Leu Arg Ser 

245 250 255 

^ AGC GGC TTC TTC AGC TCC TTC GAG GAG AGC GAC ATT GAG AAC CAC CTC 
m sir Gly III Phe Ser Ser Phe Glu Glu Ser Asp lie Glu Asn His Leu 
14 260 265 270 

!= ATT AGC GGA CAC AAT ATT GTG CAG CCC ACA GAT ATC GAG GAA AAT CGA 
i 111 sir Gly His Asn He Val Gin Pro Thr Asp He Glu Glu Asn Arg 
- 275 280 285 

^ ACT ATG CTC TTC ACG ATT GGC CAG TCT GAA GTT TAC CTC ATC AGT CCT 
?hr Zl lIu lie Thr He Gly Gin Ser Glu Val Tyr Leu He Ser ro 
■ 290 295 300 

GAC ACC AAA AAA ATA GGA TTG GAG AAA AAT TTT AAG GAG ATA TCC TTT 
As5 ?hr Lys He A A Leu Glu Lys Asn Phe Lys Glu He Ser Phe 
310 315 -^^^ 

TGC TCT CAG GGC ATC AGA CAC GTG GAC CAC TTT GGG TTT ATC TGT CGG 
cTs ser Gin Gly He Arg His Val Asp His Phe Gly Phe He Cys Arg 
325 330 335 

TAG TCT TCC GGA GGT GGC GGC TTT CAT TTT GTC TGT TAC GTG TTT CAG 
Glu III sir Gly Gly Gly Gly Phe His Phe Val Cys Tyr Val Phe Gin 
340 345 350 

TCC ACA AAT GAG GCT CTG GTT GAT GAA ATT ATG ATG ACC CTG AAA CAG 
cys Thr ^n flu Ala Leu Val Asp Glu He Met Met Thr Leu Lys Gin 
355 360 365 



706 



754 



802 



850 



898 



946 



994 



1042 



1090 



1138 



1186 



1234 



1282 
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GCC TTC ACG GTG GCC GCA GTG CAG CAG ACA GCT AAG GCG CCA GCC CAG 1330 
Ala Phe Thr Val Ala Ala Val Gin Gin Thr Ala Lys Ala Pro Ala Gin 

370 375 380 . 385 

CTG TGT GAG GGC TGC GCC CTG CAA AGC CTG CAC AAG CTC TGT GAG AGG 1378 
Leu Cys Glu Gly Cys Pro Leu Gin Ser Leu His Lys Leu Cys Glu Arg 
390 395 400 

ATA GAG GGA ATG AAT TCT TCC AAA ACA AAA CTA GAA CTG CAA AAG CAC 1426 
He Glu Gly Met Asn Ser Ser Lys Thr Lys Leu Glu Leu Gin Lys His 
405 410 415 

CTG ACG ACA TTA ACC AAT CAG GAG CAG GCG ACT ATT TTT GAA GAG GTT 1474 
Leu Thr Thr Leu Thr Asn Gin Glu Gin Ala Thr He Phe Glu Glu Val 
420 425 430 

CAG AAA TTG AGA CCG AGA AAT GAG CAG CGA GAG AAT GAA TTG ATT ATT 1522 
Gin Lys Leu Arg Pro Arg Asn Glu Gin Arg Glu Asn Glu Leu He He 
435 440 445 

h TCT TTT CTG AGA TGT TTA TAT GAA GAG AAA CAG AAA GAA CAC ATC CAT 1570 
□ Ser Phe Leu Arg Cys Leu Tyr Glu Glu Lys Gin Lys Glu His He His 
450 455 460 465 

S ATT GGG GAG ATG AAG CAG ACA TCG CAG ATG GCA GCA GAG AAT ATT GGA 1618 
^-^ He Gly Glu Met Lys Gin Thr Ser Gin Met Ala Ala Glu Asn He Gly 
y; 470 475 480 

h AGT GAA TTA CCA CCC AGT GCC ACT CGA TTT AGG CTA GAT ATG CTG AAA 1666 
© Ser Glu Leu Pro Pro Ser Ala Thr Arg Phe Arg Leu Asp Met Leu Lys 
13, 48 5 490 495 

^2. AAC AAA GCA AAG AGA TCT TTA ACA GAG TCT TTA GAA ACT ATT TTG TCC 1714 
Asn Lys Ala Lys Arg Ser Leu Thr Glu Ser Leu Glu Ser He Leu Ser 
500 505 510 

CGG GGT AAT AAA GCC AGA GGC CTG CAG GAA CAC TCC ATC AGT GTG GAT 1762 
Arg Gly Asn Lys Ala Arg Gly Leu Gin Glu His Ser He Ser Val Asp 
515 520 525 

CTG GAT AGC TCC CTG TCT AGT ACA TTA AGT AAC ACC AGC AAA GAG CCA 1810 
Leu Asp Ser Ser Leu Ser Ser Thr Leu Ser Asn Thr Ser Lys Glu Pro 
530 535 540 545 

TCT GTG TGT GAA AAG GAG GCC TTG CCC ATC TCT GAG AGC TCC TTT AAG 1858 
Ser Val Cys Glu Lys Glu Ala Leu Pro He Ser Glu Ser Ser Phe Lys 
550 555 5S0 

CTC CTC GGC TCC TCG GAG GAC CTG TCC AGT GAC TCG GAG AGT CAT CTC 1906 
Leu Leu Gly Ser Ser Glu Asp Leu Ser Ser Asp Ser Glu Ser His Leu 
565 570 575 
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GGA ACA GAG GTC ATA TTT AAA GTG GOT TTA AGT CTG TTG GGA AGC CAT 3202 
Glv Thr Glu Val He Phe Lys Val Ala Leu Ser Leu Leu Gly Ser His 
995 1000 1005 

AAG CCC TTG ATT CTG CAG CAT GAA AAC CTA GAA ACC ATA GTT GAC TTT 
Lvs Pro Leu lie Leu Gin His Glu Asa Leu Glu Thr He Val Asp Phe 

lilO 1015 1C20 10^5 

ATA AAA AGC ACG CTA CCC AAC CTT GGC TTG GTA CAG ATG GAA AAG ACC 
lie Lys Ser Thr Leu Pro Asn Leu Gly Leu Val Gin Met Glu Lys Thr 
1030 1035 1040 

ATC AAT CAG GTA TTT GAA ATG GAC ATC GCT AAA CAG TTA CAA GCT TAT 
Tie Asn Gin Val Phe Glu Met Asp He Ala Lys Gin Leu Gin Ala Tyr 

1050 1055 



1045 



GAA GTT GAG TAG CAC GTC CTT CAA GAA GAA CTT ATC GAT TCC TCT CCT 
Glu Val Glu Tyr His Val Leu Gin Glu Glu Leu He Asp Ser Ser Pro 
1060 1065 1070 

CTC AGT GAC AAC CAA AGA ATG GAT AAA TTA GAG AAA ACC AAC AGC AGC 
Leu ser Asp Asn Gin Arg Met Asp Lys Leu Glu Lys Thr Asn Ser Ser 
1075 1080 1085 

TTA CCC AAA CAG AAC CTT GAC CTC CTT GAA CAG TTG CAG GTG GCA AAT 
Leu Arg Lys Gin Asn ^^u^Asp Leu Leu Glu Gln^Leu Gin Val Ala Asn^ 

GGT AGG ATC CAA AGC CTT GAG GCC ACC ATT GAG AAG CTC CTG AGC AGT 
Gly Arg He Gin Ser Leu Glu Ala Thr He Glu Lys Leu Leu Ser Ser 
^ 1110 1115 1120 



3250 



3298 



3346 



3394 



3442 



3490 



3538 



3586 



GAG AGC AAG CTG AAG CAG GCC ATG CTT ACC TTA GAA CTG GAG CGG TCG 
Glu ser Lys Leu Lys Gin Ala Met Leu Thr Leu Glu Leu Glu Arg Ser 
1125 1130 1135 

GCC CTG CTG CAG ACG GTG GAG GAG CTG CGG CGG CGG AGC GCA GAG CCC 
Ala Leu Leu Gin Thr Val Glu Glu Leu Arg Arg Arg Ser Ala Glu Pro 
1140 11^15 1150 

AGC GAC CGG GAG CCT GAG TGC ACG CAG NCC CGA GCC CAC GGG CGA CTG 
ser Asp Arg Glu Pro Glu Cys Thr Gin Xaa Arg Ala His Gly Arg Leu 
1155 1150 1165 

ACA GCT CTG CAG GAG AGA TTG CAA CAC CAT CCC ACA CTG TCC AGG CCT 
?hr All Leu Gin Glu Arg Leu Gin His His Pro Thr Leu Ser Arg Pro 
1170 11^5 1180 1185 

TAA CTGAGAGGGA CAGAAGACGC TGGAASGAGA GAAGGAAGCG GGAAGTGTGC 

★ 

TTCTCAGGGA GGAAACCGGC TTGCCAGCAA GTAGATTCTT ACGAACTCCA ACTTGCAATT 3843 



3634 



3682 



3730 



3783 
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CAGGGGGCAT GTCCCAGTGT TTTTTTTGTT GTTTTTAGAT ACTAAATCGT CCCTTCTCCA 
GTCCTGATTA CTGTACACAG TAGCTTTAGA TGGCGTGGAC GTGAATAAAT GCAACTTATG 
TTTTAAAAAA AAAAAAAAAA AAAAAA 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1185 amino acids 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met Glu Pro Tie Thr Pho Thr Ala Arg Lys His Leu Leu Pro Asn Glu 

1 5 10 15 

val Ser Val Asp Phe Gly Leu Gin Leu Val Gly Ser Leu Pro Val His 

20 25 30 

Ser Leu Thr Thr Met Pro Met Leu Pro Trp Val Val Ala Glu Val Arg 
35 40 45 

Arg Leu Ser Arg Gin Ser Thr Arg Lys Glu Pro Val Thr Lys Gin Val 



50 



55 



60 



Arq Leu Cys Val Ser Pro Ser Gly Leu Arg Cys Glu Pro Glu Pro Gly 
65 ^0 ^ 

Arg Ser Gin Gin Trp Asp Pro Leu He Tyr Ser Ser He Phe Glu Cys 

85 90 95 

Lys Pro Gin Arg Val His Lys Leu He His Asn Ser His Asp Pro Ser 

105 110 



100 



Ty 



r Phe Ala Cys Leu He Lys Glu Asp Ala Val His Arg Gin Ser He 



Cys Ty 
130 



115 120 125 

r Val Phe Lys Ala Asp Asp Gin thr Lys Val Pro Glu He He 
135 140 
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Ser Ser He Arg Gin Ala Gly Lys He Ala Arg Gin Glu Glu Leu His 

145 150 155 160 

Cys Pro Ser Glu Phe Asp Asp Thr Phe Ser Lys Lys Phe Glu Val Leu 

- -»« 175 



165 



170 



Phe CVS Gly Arg Val Thr Val Ala His Lys Lys Ala Pro Pro Ala Leu 
180 185 190 

He Asp Glu Cys lie Glu Lys Phe Asn His Val Ser Gly Ser Arg Gly 



195 



200 



205 



Ser Glu Ser Pro Arg 

210 215 



Pro Asn Pro Pro His Ala Ala Pro Thr Gly Ser 

220 



Gin Glu Pro Val Arg Arg Pro Met Arg Lys Ser Phe Ser Gin Pro Gly 

235 240 



225 



Leu Arg 



230 



Ser Leu Ala Phe Arg Lys Glu Leu Gin Asp Gly Gly Leu Arg 



245 



250 



255 



Ser Ser Gly Phe Phe Ser Ser Phe Glu Glu Ser Asp He Glu Asn His 
260 265 270 

Leu He Ser Gly His Asn He Val Gin Pro Thr Asp He Glu Glu Asn 
275 280 285 

Ara Thr Met Leu Phe Thr He Gly Gin Ser Glu Val Tyr Leu He Ser 
290 295 300 

Pro Asp Thr Lys Lys He Ala Leu Glu Lys Asn Phe Lys Glu He Ser 
305 



310 315 320 



Phe Cys Ser Gin Gly He Arg His Val Asp His Phe Gly Phe He Cys 
- 330 335 



325 



Arg 



Glu ser Ser Gly Gly Gly Gly Phe His Phe Val Cys Tyr Val Phe 



340 



345 



350 



Gin Cys Thr Asn Glu Ala Leu Val Asp Glu He Met Met Thr Leu Lys 



355 



360 



365 



Gin Ala Phe Thr Val Ala Ala Val Gin Gin Thr Ala Lys Ala Pro Ala 

375 380 



370 



Gin Leu Cys Glu Gly Cys 

385 390 395 



Pro Leu Gin Ser Leu His Lys Leu Cys Glu 

400 



Arq He Glu Gly Met Asn Ser Ser Lys Thr Lys Leu Glu Leu Gin Lys 
^ 410 415 



405 



His Leu Thr Thr Leu Thr Asn Gin Glu Gin Ala Thr He Phe Glu Glu 

425 430 



420 



Val Gin Lys Leu Arg Pro Arg Asn Giu Gin Arg Glu Asn Glu Leu lie 
435 440 445 



He Ser Phe Leu Arg 
450 

His He Giy Glu Met 
465 

Gly Ser Glu Leu Pro 
485 

Lys Asn Lys Ala Lys 
500 

Ser Arg Gly Asn Lys 
515 

Asp Leu Asp Ser Ser 
530 

Pro Ser Val Cys Glu 
545 

Lys Leu Leu Gly Ser 
565 

Leu Pro Glu Glu Pro 
580 

Arg Ala Asn Thr Leu 
595 

Gin Pro Ala Arg Gly 

610 

Tyr His Ser Val Ser 
625 

Ser Lys Ala Asn His 
645 

Arg Arg His Ser Trp 
660 

Gin Lys Ala Cys Asp 
675 

Gly Glu Leu Pro Pro 
690 



Cys Leu Tyr Glu Glu Lys 
455 

Lys Gin Thr Ser Gin Met 
470 475 

Pro Ser Ala Thr Arg Phe 
490 

Arg Ser Leu Thr Glu Ser 
505 

Ala Arg Gly Leu Gin Glu 

520 

Leu Ser Ser Thr Leu Ser 
535 

Lys Glu Ala Leu Pro He 
550 555 

Ser Glu Asp Leu Ser Ser 
570 

Ala Pro Leu Ser Pro Gin 
585 

Ser His Phe Pro He Glu 
600 

Ser Pro Gly Val Ser Gin 

615 

Thr Glu Thr Pro His Glu 

630 SI 635 

Leu Gly Asp Ser Gly Gly 
650 

Arg Gin Gin He Phe Leu 

665 

Ser Ser Ser Arg Tyr Glu 
680 

Arg Ser Pro Leu Glu Pro 
695 



Gin Lys Glu His He 
460 

Ala Ala Glu Asn He 
480 

Arg Leu Asp Met Leu 
495 

Leu Glu Ser He Leu 
510 

His Ser He Ser Val 
525 

Asn Thr Ser Lys Glu 
540 

Ser Glu Ser Ser Phe 
560 

Asp Ser Glu Ser His 
575 

Gin Ala Phe Arg Arg 
590 

Cys Gin Giu Pro Pro 
605 

Arg Lys Leu Met Arg 

620 

Arg Lys Asp Phe Glu 
640 

Thr Pro Val Lys Thr 
655 

Arg Val Ala Thr Pro 
670 

Asp Tyr Ser Glu Leu 
685 

Val Cys Glu Asp Gly 
700 
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Pro Phe Gly Pro Pro Pro Glu Glu Lys Lys Arg Thr Ser Arg Glu Leu 
705 710 715 720 

Arq Glu Leu Trp Gin Lys Ala lie Leu Gin Gin He Leu Leu Leu Arg 
725 730 735 

Met Glu Lys Glu Asn Gin Lys Leu Gin Ala Ser Glu Asn Asp Leu Leu 
740 745 750 

Asn Lys Arg Leu Lys Leu Asp Tyr Glu Glu He Thr Pro Cys Leu Lys 
755 760 765 

Glu Val Thr Thr Val Trp Glu Lys Met Leu Ser Thr Pro Gly Arg Ser 
770 775 780 

Lvs He Lvs Phe Asp Met Glu Lys Met His Ser Ala Val Gly Gin Gly 
785 790 795 800 

Val Pro Arg His His Arg Gly Glu He Trp Lys Phe Leu Ala Glu Gin 



805 



810 



815 



Phe His Leu Lys His Gin Phe Pro Ser Lys Gin Gin Pro Lys Asp Val 
820 825 830 

Pro Tyr Lys Glu Leu Leu Lys Gin Leu Thr Ser Gin Gin His Ala He 
835 840 845 

Leu He Asp Leu Gly Arg Thr Phe Pro Thr His Pro Tyr Phe Ser Ala 
850 855 860 

Gin Leu Gly Ala Gly Gin Leu Ser Leu Tyr Asn He Leu Lys Ala Tyr 
865 870 875 880 

Ser Leu Leu Asp Gin Glu Val Gly Tyr Cys Gin Gly Leu Ser Phe Val 
885 890 895 

Ala Gly He Leu Leu Leu His Met Ser Glu Glu Glu Ala Phe Lys Met 



900 



905 



910 



Leu Lys Phe Leu Met Phe Asp Met Gly Leu Arg Lys Gin Tyr Arg Pro 
915 920 925 

Asp Met He He Leu Gin He Gin Met Tyr Gin Leu Ser Arg Leu Leu 
930 935 940 

His Asp Tyr His Arg Asp Leu Tyr Asn His Leu Glu Glu His Glu He 
945 950 955 960 

Glv Pro Ser Leu Tyr Ala Ala Pro Trp Phe Leu Thr Met Phe Ala Ser 
965 970 975 

Gin Phe Pro Leu Gly Phe Val Ala Arg Val Phe Asp Met He Phe Leu 
980 985 990 
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Gin Gly Thr Glu Val lie Phe Lys Val Ala Leu Ser Leu Leu Gly Ser 
995 1000 1005 



His Lys Pro Leu lie Leu Gin His Glu Asn Leu Glu Thr lie Val Asp 
1010 1015 1020 

Phe He Lys Ser Thr Leu Pro Asn Leu Gly Leu Val Gin iMec Glu Lys 
1025 1030 1035 1040 

Thr He Asn Gin Val Phe Glu Met Asp He Ala Lys Gin Leu Gin Ala 
1045 1050 1055 

Tyr Glu Val Glu Tyr His Val Leu Gin Glu Glu Leu He Asp Ser,..^er 
1060 1065 1070_;;'" 

Pro Leu Ser Asp Asn Gin Arg Met Asp Lys Leu Glu Lys Thr Asn Ser 
1075 1080 1085 

Ser Leu Arg Lys Gin Asn Leu Asp Leu Leu Glu Gin Leu Gin Val Ala 
1090 1095 1100 

Asn Gly Arg He Gin Ser Leu Glu Ala Thr He Glu Lys Leu Leu Ser 
1105 1110 1115 1120 

Ser Glu Ser Lys Leu Lys Gin Ala Met Leu Thr Leu Glu Leu Glu Arg 
1125 1130 1135 

Ser Ala Leu Leu Gin Thr Val Glu Glu Leu Arg Arg Arg Ser Ala Glu 
1140 1145 1150 

Pro Ser Asp Arg Glu Pro Glu Cys Thr Gin Ser Arg Ala His Gly Arg 
1155 1160 1165 

Leu Thr Ala Leu Gin Glu Arg Leu Gin His His Pro Thr Leu Ser Arg 
1170 1175 1180 

Pro 
1185 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 458 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : DOUBLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: 5» EST 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



ATAATAGGCA 


CTGAAGACAT 


GTTAATGGAA 


GGTGGATTTG 


TGATTCAGAA 


CCTCTAGACT 


60 


ACCTGGGCGA 


GTCTTTTAAA 


ATGTTTCTGC 


ATATGAAGTG 


TGTAAAATAG 


ATTGCTTGAT 


120 


CCAAAACAGA AAAACAGTGA 


TAACTGTTTT 


GCTGAGTTCC 


CAGACCCTTC 


CCAAGATGGA 


180 


ACCAATAACA 


TTCACAGCAA 


GGAAACATCT 


GCTTCCTAAC 


GAGGTCTCGG 


TGGATTTTGG 


240 


CCTGCAGCTG 


GTGGGCTCCC 


TGCCTGTGCA 


TTCCCTGACC 


ACCATGCCCA 


TGCTGCCCTG 


300 


GGTTGTGGCT 


GAGGTGCGAA 


GACTCAGCAG 


GCAGTCCACC 


AGAAAGGAAC 


CTGTAACCAN 


360 


GCAANTCCGG 


CTTTGCGTTT 


CACCCTCTGG 


ACTGAGATGT 


GAACCTGAGC 


CAGGGAGAAG 


420 


2; TCAACAGTGG 


GATCCCCTGA 


TCTATTCCAG 


CATCTTTG 






458 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-430-352 

(B) LOCATION: 1 . . 47 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

(D) OTHER INFORMATION: base c 

^ (A) NAME/KEY: Potential microsequencing oligo 99-430-352 
(B) LOCATION: 1..23 

(ix) FEATURE: 

(A) NAME/KEY: Potential microsequencing oligo 99-430-352 

(B) LOCATION: complement 25.. 47 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ACTTGCAAAC TAACACGTTA GCCCGCCACA GTTATCCTGC TGGTAGA 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic fragment 99-430-352 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: variant version of SEQ ID7 

(ix) FEATURE: 

(A) NAME/KEY: polymorphic base 

(B) LOCATION: 24 

^ (D) OTHER INFORMATION: base t; c in SEQ ID7 

(i^) EEATURE: ^ n . on A-^r^--^^^. 

NAME/KEY: Potential microsequencxng oligo yy ^Ju oo/i. 

(B) LOCATION: 1..23 

(ix) FEATURE: ^ ■ qq ^in-^S? 

(A) NAME/KEY: Potential microsequencing oligo yy-^JU 

(B) LOCATION: complement 25.. 47 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

ACTTGCAAAC TAACACGTTA GCCTGCCACA GTTATCCTGC TGGTAGA 



{ft: 



47 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: 

(A) NAME/KEY: upstream amplification primer for SEQ ID7, SEQ 

IDS 

(B) LOCATION: 1..18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

1 8 

CACCCATCGC ATATTTCC 



(2) INFORMATION FOR SEQ ID NO: 10; 



(i) SEQUENCE CHARACTERISTICS: 
O: (A) LENGTH: 18 base pairs 

(B) TYPE: NUCLEIC ACID 
\Tl (C) STRANDEDNESS: SINGLE 

iTv ( D) TOPOLOGY : LINEAR 

\^ (ii) MOLECULE TYPE: DNA 

Q (vi) ORIGINAL SOURCE: 

iS- (A) ORGANISM: Homo sapiens 



ID8 



( ix ) FEATURE * 

(A) NAME/KEY: downstream amplification primer for SEQ ID7, SEQ 

(B) LOCATION: 1 . . 18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

1 8 

AGAAATGCTG ACTGTCTG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 
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( 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



( jl X } FEATURE ' 

(A) NAME/KEY: microsequencing oligo 99-4 30-352 .misl 

(B) LOCATION: 1..19 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

19 

GCAAACTAAC ACGTTAGCC 



'4 h 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: UNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

^'""^ TaT^n'S.WkEY: potential microsequencing oligo 99-430-352 .mis2 

(B) LOCATION: 1..23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

23 

TCTACCAGCA GGATAACTGT GGC 



(2) INFORMATION FOR SEQ ID NO: 13: 1 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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(ix) FEATURE: 

(A) NAME/KEY: primer for cDNA isolation 

(B) LOCATION: 1..19 

(ix) FEATURE: 

(A) NAME/KEY: positions in SEQID3 

(B) LOCATION: 271.. 289 



xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



TGACCACCAT GCCCATGCT 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(ix) FEATURE: _ 

(A) NAME/KEY: primer for cDNA isolation 

(B) LOCATION: 1..21 

(ix) FEATURE: 

(A) NAME/KEY: positions m SEQID3 

(B) LOCATION: complement 3950.. 3930 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



GCATTTATTC ACGTCCACGC C 
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